7/29/2020
1
Cop yri gh t © SAS In st it ute In c. Al l ri gh ts re se rv ed .
Top 5 PROC SQL Hacks
Create Your Own Adventure
29 July 2020
Charu Shankar
SAS Institute Canada Inc.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
2
SAS Senior Technical Trainer, Charu teaches by engaging with logic,
visuals and analogies to spark critical thinking. She interviews users
to recommend the right SAS training.
SAS blogger, yoga teacher & chef, Charu also helps support users
looking to land work using SAS through Linkedin.
Charu has presented at over 100 SAS international user group
conferences on SAS programming, SAS Enterprise Guide, PROC SQL,
DS2 programming, Python, Viya etc.
Charu Shankar
Hack # 1: PROC SQL Syntax Order Mnemonic
Hack # 2: Know thy data : Dictionary tables
Hack # 3: Summarizing data using the Boolean Gate
Hack # 4: Where ANSI SQL falls short and PROC SQL steps in
Hack # 5: Find your data Pattern
1
2
7/29/2020
2
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
Hack #1
PROC SQL Syntax Order Mnemonic
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
4
SELECT Statement: Required Clauses
Here are two things that SQL always needs:
1.
What do you want?
The SELECT clause specifies the columns and column order.
2.
Where do you want it from?
The FROM clause specifies the data sources.
You can query from 1 to 256 tables.
SELECT object-item <, ...object-item>
FROM from-list;
3
4
7/29/2020
3
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
5
SELECT Statement: Syntax Order Mnemonic
The WHERE clause specifies data that meets certain conditions.
The GROUP BY clause groups data for processing.
The HAVING clause specifies groups that meet certain conditions.
The ORDER BY clause specifies an order for the data.
SELECT object-item <, ...object-item>
FROM from-list
<WHERE sql-expression>
<GROUP BY object-item <, object-item >>
<HAVING sql-expression>
<ORDER BY order-by-item <DESC>
<, order-by-item>>;
SO
FEW
WORKERS
GO
HOME
ON TIME
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
6
5
6
7/29/2020
4
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
Hack #2
Know Thy Data: DICTIONARY Tables
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
8
DICTIONARY Tables: Overview
DICTIONARY tables are Read-Only metadata views that contain
session metadata, such as information about
SAS libraries, data sets, and external files in use
or available in the current SAS session.
DICTIONARY tables are
created at SAS session initialization
updated automatically by SAS
limited to Read-Only access.
You can query DICTIONARY tables with PROC SQL.
7
8
7/29/2020
5
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
9
Exploring DICTIONARY Tables
describe table dictionary.tables;
NOTE: SQL table DICTIONARY.TABLES was created like:
create table DICTIONARY.TABLES
(
libname char(8) label='Library Name',
memname char(32) label='Member Name',
...
crdate num format=DATETIME informat=DATETIME label='Date Created',
modate num format=DATETIME informat=DATETIME label='Date Modified',
nobs num label='Number of Physical Observations',
obslen num label='Observation Length',
nvar num label='Number of Variables', ...);
You can use a DESCRIBE statement to explore the structure of
DICTIONARY tables:
Partial Log
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
10
Querying Dictionary Information
Display information about the columns in sashelp.cars.
title 'Columns in the sashelp.cars
Table';
proc sql;
select Name,Type,Length
from dictionary.columns
where libname='SASHELP'
and memname='CARS';
quit;
Table names (memnames)
are also stored in uppercase
in DICTIONARY tables.
9
10
7/29/2020
6
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
11
Viewing the Output
PROC SQL Output
Column names are stored in mixed case.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
12
title 'Tables Containing an ID Column';
proc sql;
select memname 'Table Names', name
from dictionary.columns
where libname='SASHELP' and
upcase(name) contains 'ID';
quit;
Using Dictionary Information
Which tables contain an ID column?
Because different tables might use different cases
for same-named columns, you can use the UPCASE
function for comparisons. However, this significantly
degrades the performance of the query.
11
12
7/29/2020
7
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
13
Viewing the Output
All ID column names are stored in uniform uppercase, so
the UPCASE function is not needed the next time that a
query such as this is executed.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
14
Finding Common Column Names Dynamically
All of the previous techniques to explore DICTIONARY tables
work when you know the names of columns.
What happens if you do not know your data, and you want SAS
to retrieve all same-named columns in a library.
Use the following code
title 'Common columns in SASHELP';
proc sql;
select name, type, length, memname
from dictionary.columns
where libname='SASHELP’
group by name
having count(name) > 1;
13
14
7/29/2020
8
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
15
Viewing the Output
Joins are easier because the structure of each table does not have to be
examined before determining common columns. Let SAS bring common
columns dynamically by looking up DICTIONARY tables.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
16
15
16
7/29/2020
9
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
Hack # 3
Summarizing Data using the Boolean Gate
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
18
Business Scenario
Create a report that shows the total number of current Orion Star
employees and a report that shows the total number of current CYOA
managers.
PROC SQL
Desired Report 1
CYOA.employee_information
Count
ƒƒƒƒƒƒƒ
Count
ƒƒƒƒƒƒƒ
Desired Report 2
17
18
7/29/2020
10
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
19
Business Scenario
Create a report that lists the following for each department:
total number of managers
total number of non-manager employees
manager-to-employee (M/E) ratio
Below is a rough sketch of the desired report.
Department Managers Employees
M/E
Ratio
Accounts 1 5 20%
Administration 2 20 10%
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
20
Business Data
Determine whether an employee is a manager
or a non-manager.
The Job_Title column contains the information about each
employee.
Department Job_Title
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Administration Administration Manager
Administration Secretary I
Administration Office Assistant II
19
20
7/29/2020
11
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
21
Counting Rows That Meet a Specified Criterion
How do you determine the rows that do have Manager
in Job_Title, as well as rows that do not? You cannot use a WHERE clause to
exclude either group.
Use the FIND function in a Boolean expression to identify rows that contain
Manager in the Job_Title column.
Department Job_Title
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Administration Administration Manager
Administration Secretary I
Administration Office Assistant II
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
22
FIND Function
The FIND function returns the starting position of the first
occurrence of a substring within a string (character value).
Find the starting position of the substring Manager in the character variable
Job_Title.
The value returned by the FIND function is 16.
Job_Title 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
A d m i n i s t r a t i o n M a n a g e r
find(Job_Title,"manager","i")
FIND(string, substring<,modifier(s)><,startpos>)
21
22
7/29/2020
12
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
23
Using Boolean Expressions
Part 1: Use a Boolean expression to determine whether
an employee is a manager.
Note: Boolean expressions evaluate to true (1) or false (0).
If Job_Title contains Manager, the value is 1.
If Job_Title does not contain Manager, the value is 0.
proc sql;
select Department, Job_Title,
(find(Job_Title,"manager","i")>0)
"Manager"
from CYOA.employee_information;
quit;
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
24
Viewing the Output
Partial PROC SQL Output
Department Job_Title Manager
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Administration Administration Manager 1
Administration Secretary I 0
Administration Office Assistant II 0
Administration Office Assistant III 0
Administration Warehouse Assistant II 0
Administration Warehouse Assistant I 0
Administration Warehouse Assistant III 0
Administration Security Guard II 0
Administration Security Guard I 0
Administration Security Guard II 0
Administration Security Manager 1
23
24
7/29/2020
13
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
25
Using Boolean Expressions
Part 2: Calculate the statistics requested.
proc sql;
title "Manager-to-Employee Ratios";
select Department,
sum((find(Job_Title,"manager","i")>0))
as Managers,
sum((find(Job_Title,"manager","i")=0))
as Employees,
calculated Managers/calculated Employees
"M/E Ratio" format=percent8.1
from CYOA.employee_information
group by Department;
quit;
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
26
Viewing the Output
PROC SQL Output
Manager-to-Employee Ratios
M/E
Department Managers Employees Ratio
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Accounts 3 14 21.4%
Accounts Management 1 8 12.5%
Administration 5 29 17.2%
Concession Management 1 10 10.0%
Engineering 1 8 12.5%
Executives 0 4 0.0%
Group Financials 0 3 0.0%
Group HR Management 3 15 20.0%
IS 2 23 8.7%
Logistics Management 6 8 75.0%
Marketing 6 14 42.9%
Purchasing 3 15 20.0%
Sales 0 201 0.0%
Sales Management 5 6 83.3%
Secretary of the Board 0 2 0.0%
Stock & Shipping 5 21 23.8%
Strategy 0 2 0.0%
25
26
7/29/2020
14
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
27
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
Hack # 4
Where ANSI falls short and PROC SQL steps in
27
28
7/29/2020
15
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
29
Business Scenario
Tom Zhou is a sales manager who needs access to personnel
information for his staff.
Tom Zhou
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
30
Business Data
The data that Tom needs is name, job title, salary,
and years of service. This data is contained in three tables.
CYOA.employee_addresses
CYOA.employee_payroll
CYOA.employee_information
29
30
7/29/2020
16
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
31
What Is a PROC SQL View?
A PROC SQL view
is a stored query
contains no actual data
can be derived from one or more tables, PROC SQL views,
DATA step views, or SAS/ACCESS views
extracts underlying data each time it is used and accesses
the most current data
can be referenced in SAS programs in the same way as a
data table
cannot have the same name as a data table stored
in the same SAS library.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
32
Partial SAS Log
5 proc sql;
46 create view CYOA.tom_zhou as
47 select Employee_Name as Name format=$25.0,
48 Job_Title as Title format=$15.0,
49 p.Salary 'Annual Salary' format=comma10.2,
50 int((today()- p.Employee_Hire_Date)/365.25)
51 as YOS 'Years of Service'
52 from employee_addresses as a,
53 employee_payroll as p,
54 employee_information as o
55 where a.Employee_ID=p.Employee_ID and
56 o.Employee_ID=p.Employee_ID and
57 Manager_ID=120102;
NOTE: SQL view CYOA.TOM_ZHOU has been defined.
Creating a PROC SQL View
31
32
7/29/2020
17
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
33
Location of a PROC SQL View
ANSI standards specify that the view must reside
in the same SAS library as the contributing table or tables.
libname=CYOA
Employee_Addresses
Tom_Zhou
s:\workshop
Employee_Information
Employee_Payroll
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
34
Location of the Source Tables: ANSI
In PROC SQL, the default libref for the table (or tables)
in the FROM clause is the libref of the library that contains the view.
When the view and data source are in the same location, you specify a
one-level name for the table (or tables) in the FROM clause.
create view CYOA.tom_zhou as
from employee_addresses as a,
employee_payroll as p,
employee_information as o
33
34
7/29/2020
18
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
35
Business Scenario
You created a PROC SQL view to provide Tom Zhou access to personnel data
for his direct reports.
Tom copied his view to a folder on his hard drive.
Now Tom reports that the view does not work anymore, and he asked for
your help to resolve the problem.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
36
Exploring the Problem
Tom submitted the following:
libname CYOA 'c:\temp';
proc sql;
title "Tom Zhou's Direct Reports";
title2 "By Title and Years of Service";
select *
from CYOA.tom_zhou
order by Title desc, YOS desc;
quit;
title;
s107d09
35
36
7/29/2020
19
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
37
Viewing the Log
Partial SAS Log
libname CYOA ‘c:\workshop';
NOTE: Libref CYOA was successfully assigned as follows:
Engine: V9
Physical Name: c:\workshop
proc sql;
title "Tom Zhou's Direct Reports";
title2 "By Title and Years of Service";
select *
from CYOA.tom_zhou
order by Title desc, YOS desc;
ERROR: File CYOA.EMPLOYEE_ADDRESSES.DATA does not exist.
ERROR: File CYOA.EMPLOYEE_PAYROLL.DATA does not exist.
ERROR: File CYOA.EMPLOYEE_INFORMATION.DATA does not exist.
quit;
title;
NOTE: The SAS System stopped processing this step because of
errors.
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
38
A Violation
Tom moved his view to his C:\workshop folder and redefined the CYOA library
there. This violated the one-level naming convention in the FROM clause
in the view code.
libname CYOA ‘c:\workshop';
proc sql;
title "Tom Zhou's Direct Reports";
title2 "By Title and Years of Service";
select *
from CYOA.tom_zhou
order by Title desc, YOS desc;
quit;
37
38
7/29/2020
20
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
39
Making a View Portable
create view CYOA.Tom_Zhou as
select Employee_Name as Name format=$25.0,
Job_Title as Title format=$15.0,
p.Salary "Annual Salary" format=comma10.2,
int((today()-p.Employee_Hire_Date)/365.25)
as YOS 'Years of Service'
from CYOA.employee_addresses as a,
CYOA.employee_payroll as p,
CYOA.employee_information as o
where a.Employee_ID=p.Employee_ID and
o.Employee_ID=p.Employee_ID and
Manager_ID=120102
using libname CYOA "s:\workshop";
CREATE VIEW view AS SELECT…
<USING LIBNAME-clause<, LIBNAME-clause>>;
A USING clause names the
location of the tables.
two-level data
set names
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
40
libname=CYOA
employee_addresses
s:\workshop
Two-Level Table Names in Permanent Views
The USING clause libref is local to the view,
and it will not conflict with an identically named libref
in the SAS session.
When the query finishes, the libref is disassociated.
CREATE VIEW proc-sql-view AS SELECT
<USING LIBNAME-clause<, LIBNAME-clause>>;
libname=CYOA
tom_zhou
c:\workshop
employee_information
employee_payroll
39
40
7/29/2020
21
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
41
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
Hack # 5
Find Your Data Pattern
41
42
7/29/2020
22
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
43
Business Scenario
Find data that matches a pattern. HS10_ column has a series of any 10 or 6 digit
numbers. An additional challenge- this series never appears in the same position”.
Sample of the data
Sticks or profile shapes of subheading 3916.10
Reproduction proofs for the production of printing plates, rolls, of tariff item No. 8442.50.20
Microcopies of tariff item No. 4903.00.10, 4905.91.00, 4911.10.10 or 4911.10.20
5. FIND YOUR DATA PATTERN
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
44
proc sql;
select * from pattern
where prxmatch(("/\d{4}\./"),HS10_TSCHED_EDESC)> 0;
5. RECOGNIZE PATTERNS IN DATA
/ I used forward slashes as default Perl delimiters.
\d matches a digit 0 to 9
{n} matches the previous expression n times, \d{4} matches any 4 digits
\. Is the pattern to match a period
Of the original 12,865 rows, SAS found 52 rows that matched our pattern.
43
44
7/29/2020
23
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
45
5. RECOGNIZE PATTERNS IN DATA
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
46
45
46
7/29/2020
24
Copyr igh t © SA S I ns ti tu te In c . All ri gh ts re se rve d .
Handy Links
Proc sql views
Fullstimer system option
SAS 9.4 Proc sql users guide
Find your data pattern with PERL
PERL Regular Expressions cheatsheet
Using The Boolean operation in PROC SQL
Working with Subquery in the SQL procedure
Go home on time with these 5 PROC SQL tips
Know thy data: Dictionary tables SAS Global Forum Paper
Step by step PROC SQL – SAS Global forum 2020 virtual event
HANDY RESOURCES
Cop yri gh t © SAS In st it ute In c. Al l ri gh ts re se rv ed .
Thank you
Questions & Comments
47
48