Rise beyond the trivial: July 2013

When we declare a variable of column as varchar2 datatype, we specify it size in brackets. For example consider following create table statement. Here what does 10 means?

CREATE TABLE test(
col1 VARCHAR2(10)
)

We all know that the VARCHAR2 datatype stores variable-length character strings. When you create a table with a VARCHAR2 column, you specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column.

The length semantics of character datatypes can be measured in bytes or characters. For example:

CREATE TABLE test(
col1 VARCHAR2(10),
col2 VARCHAR2(10 BYTE),
col3 VARCHAR2(10 CHAR)
)

Here column col1 and col2 are of 10 bytes length and col3 is of 10 character length.

Byte semantics treat strings as a sequence of bytes. This is the default for character datatypes. we can change this default behavior by modifying NLS_LENGTH_SEMANTICS parameter.
Character semantics treat strings as a sequence of characters.

For a single byte character set, column defined in character semantics are same as those defined in byte semantics. For multi-byte characters things becomes interesting. For multi-byte characters character semantics is useful. Character semantics reduces the complexity when defining the actual length requirements for data storage. For example, if we have to define a VARCHAR2 column that can store up to five Chinese characters together with five English characters. Assuming Unicode database (UTF8) in byte semantics, this would require (5*3 bytes) + (1*5 bytes) = 20 bytes; in character semantics, the column would require 10 characters.

Following sequence of examples will clear things and also teach us how to change the default behaviuor of byte semantics by altering NLS_LENGTH_SEMANTICS parameter.

SELECT VALUE
FROM v$parameter
WHERE upper(NAME) LIKE '%NLS_LENGTH_SEMANTICS%'; --either byte of char.

SQL> alter SESSION set nls_length_semantics='CHAR';

Session altered
SQL> drop table test1;

Table dropped
SQL> CREATE TABLE test1(
2 a VARCHAR2(5),
3 b VARCHAR2(5 CHAR),
4 c VARCHAR2(5 BYTE)
5 );

Table created
SQL> SELECT column_name, data_length, char_length
2 FROM user_tab_columns
3 WHERE table_name = 'TEST1';

COLUMN_NAME DATA_LENGTH CHAR_LENGTH
------------------------------ -------------------------- -----------
A 20 5
B 20 5
C 5 5
SQL> INSERT INTO test1(a,b,c)
2 VALUES ('india','india','india');

1 row inserted
SQL> INSERT INTO test1(a,b,c)
2 VALUES ('šovic','šovic','šovic');

INSERT INTO test1(a,b,c)
VALUES ('šovic','šovic','šovic')

ORA-12899: value too large for column "SYSTEM"."TEST1"."C" (actual: 6, maximum: 5)
SQL> INSERT INTO test1(a,b,c)
2 VALUES ('šovic','šovic','šovi');

1 row inserted
SQL> SELECT DUMP('šovic'),LENGTHB('šovic') ,LENGTH('šovic')
2 FROM dual;

DUMP('ŠOVIC') LENGTHB('ŠOVIC') LENGTH('ŠOVIC')
-------------------------------------------------------- ------------------------------- ---------------
Typ=96 Len=6: 197,161,111,118,105,99 6 5

SQL> alter SESSION set nls_length_semantics='BYTE';

Session altered
SQL> CREATE TABLE test2(
2 a VARCHAR2(5),
3 b VARCHAR2(5 CHAR),
4 c VARCHAR2(5 BYTE)
5 );

Table created
SQL> SELECT column_name, data_length, char_length
2 FROM user_tab_columns
3 WHERE table_name = 'TEST2';

COLUMN_NAME DATA_LENGTH CHAR_LENGTH
------------------------------ ----------------------------- ------------------------
A 5 5
B 20 5
C 5 5
SQL> INSERT INTO test2(a,b,c)
2 VALUES ('india','india','india');

1 row inserted
SQL> INSERT INTO test2(a,b,c)
2 VALUES ('šovic','šovic','šovic');

INSERT INTO test2(a,b,c)
VALUES ('šovic','šovic','šovic')

ORA-12899: value too large for column "SYSTEM"."TEST2"."A" (actual: 6, maximum: 5)
SQL> INSERT INTO test2(a,b,c)
2 VALUES ('šovi','šovic','šovic');

INSERT INTO test2(a,b,c)
VALUES ('šovi','šovic','šovic')

ORA-12899: value too large for column "SYSTEM"."TEST2"."C" (actual: 6, maximum: 5)
SQL> INSERT INTO test2(a,b,c)
2 VALUES ('šovi','šovic','šovi');

1 row inserted

References:

Oracle® Database Concepts
11g Release 1

SQL server does not have rownum pseudo column. So if we would like to fetch row number along with select statement we have to use a ranking function ROW_NUMBER.

In oracle we can achieve same thing very easily using rownum. for example:

CREATE TABLE test1
AS
SELECT table_name, column_name
FROM user_tab_columns;

SELECT ROWNUM, table_name, column_name
FROM test1;

But SQL Server does not have rownum like pseudo column. So instead we have to do something like this:

SELECT OBJECT_NAME(object_id) as table_name, name as column_name
INTO test1
FROM sys.columns;

SELECT ROW_NUMBER() OVER (ORDER BY table_name, column_name) as rownum,
table_name, column_name
FROM test1;

Now row_number() is an anaytical function and it also has partition by clause just like its oracle counter-part. So if you would like to partition/divide the result set and show rownum based on each partition you can do that. In the above example lets say you would like to show row number by each table name. Then following query will do just like that:

SELECT ROW_NUMBER() OVER (PARTITION BY table_name ORDER BY table_name, column_name) as rownum,
table_name, column_name
FROM test1;

Rise beyond the trivial

Monday, July 15, 2013

ORACLE: Meaning of string length in varchar2 datatype.

References:

Thursday, July 11, 2013

SQL Server: Oracle's ROWNUM equivalent in SQL server.