Notes
Introduction
Languages we’ll be using:
- C
- Lisp
- Scala
- Prolog
- Python
Topics:
- syntax
- semantics of functions, scopes, symbols
- composition
- OOP, FP, declarative
- scripting
- pragmatics
- range of use
- design automation
- performance
Syntax
One thing every language shares is a set of rules for expressing a program.
They also have some set of semantics. The line int a = 1
is legal syntax in
many languages (C, Java, etc.)
Syntax | Semantics | Implementation |
---|---|---|
int a = 1 |
There exists a storage location a , |
allocate space, |
set its value to 1 |
initialize to 1 |
Feature | Java | C | C++ |
---|---|---|---|
classes | Yes | No | Yes |
objects | Yes | Not really | Yes |
GC | Yes | No | No |
scopes | Yes | Yes | Yes |
locals | Yes | Yes | Yes |
refrences | Yes | No, Pointers | Both |
bad language | Yes | No | Yes |
Syntactic specs
- tokens
+
}
123
fred
- grammar
- context-free grammar
S -> IfStat | AssignStat
IfStat -> if ( Exp ) S
AssignStat -> ID = Exp
Exp -> Var | Exp + Exp | Exp * Exp | (Exp)
- CFGs can be ambiguous in:
- precedence
a + b * c
a + (b * c)
(a + b) * c
- associativity
a - b - c
a - (b - c)
- right associative
(a - b) - c
- left associative (most common)
a ** b ** c
a ** (b ** c)
- most common
(a ** b) ** c
a = b = c = 12
((a = b) = c) = 12
- doesn’t make sense in most languages
a = (b = (c = 12))
- precedence
Args -> Exp | Exp, Args
Stats -> Stat; | Stat; Stats
- context-free grammar
Bitwise
- C was the first language to provide an interface to bitwise operations like
people are used to today
&
bitwise AND|
bitwise OR^
bitwise XOR~
bitwise NOT
int a = 2; /* 00000000000000000000000000000010 */
a |= 0x4; /* 00000000000000000000000000000110 */
a |= 0xff00; /* 00000000000000001111111100000110 */
if ((a & 2) != 0) /* if 2nd bit is not 0 (ON) */
#define APPLES 0x2
if (a & APPLES) /* same thing */
/* unset bit */
a &= ~0x4 /* 00000000000000001111111100000010 */
int
isodd(int x)
{
return x & 0x1;
}
int
multiplyby2n(int x, int n)
{
return x << n;
}
unsigned
unsigned int a = -1;
unsigned int b = -2;
(a < b) /* => true */
(a >> b) /* depends on whether a and b are signed/unsigned (confusing) */
- java learned from C’s
>>
mistake and created two operators>>
signed bit shift right>>>
unsigned bit shift right
- logical operators
&&
||
- tells you something about K+R’s thinking, since they made the bitwise operators more convenient to write than the logical ones
- some ambiguity with precidence
if (a || b & c) /* vs */ if (a || (b & c)) /* vs */ if ((a || b) & c)
- switch
- the entire east coast AT&T network went down for a half a day once because of an error due to the switch syntax in C
switch(a) {
case 1: printf(...); /* you probably wanted to break here */
case 2: return ...;
default: break;
}
/*
if a == 1, print, then return.
if a == 2, return.
else, break
*/
- referencing functions before they’re defined
- was too expensive when C was created
int
f(int x)
{
return g(x);
}
int
g(int x)
{
return 12;
}
/* Not a fatal error but may as well be */
/* Can be fixed by adding `int g(int)` before declaring `f` */
int
main(char** argv, int argc) /* can just be main() */
{
printf("%d", f(x));
/* can return nothing, which means return whatever happened to be in
the memory location that would have been used for an actual return
value */
}
- forcing you to have declarations makes the compiler faster
- having all of those declarations at the head of your program sucks
- thus
.h
(header) files were born- all declarations get placed in a file with a
.h
extension - you include that file in your
.c
program with#include file.h
- all declarations get placed in a file with a
C
int a = 12;
valid C statement, but what’s the context?
static int b = 13; // global
void f() {
int a = 12; // local
/* could also write `auto int a = 12;` to emphasize that it's local */
g();
}
main() {
f();
}
Stack |
---|
main |
f |
g |
- Data Types
long
/long int
- has at least as many bits as
int
- C99 introduced
long long
which is guaranteed to be 64-bits
- has at least as many bits as
int
- has at least as many bits as
short
- has at least as many bits as
short
- has at least as many bits as
char
- has at least as many bits as
char
- at least 8-bits
- if a machine’s byte is at least 8-bits,
char
is the size of a byte - C does no guarantees about the character encoding
float
- has at least as many bits as
int
(just about always 32-bits)
- has at least as many bits as
double
- twice the length of a float (probably?)
long double
- has at least as many bits as
double
- has at least as many bits as
- structs
struct
point
{
int x;
int y;
}
struct
shape
{
struct point origin;
int length;
/* ... */
}
void
draw(struct point *p)
{
moveto(p->x, p->y);
/* ... */
}
main()
{
/* local struct, which goes away when `main` terminates */
struct point pt;
pt.x = foo; pt.y = bar;
draw(&pt); /* unary & is the address-of operator */
/* struct with allocated memory, which remains after `main` terminates */
struct point *p = (struct point *) malloc(sizeof(struct point));
p->x = foo; p->y = bar;
draw(p);
/* deallocate the memory used for `p` */
free(p);
}
- pointers
main()
{
int a = 12, b = 17;
swap(a, b);
}
/* does not work, because everything is local */
void
swap(int x, int y)
{
int tmp = x;
x = y;
y = tmp;
}
/* does work, thanks to pointers */
void
swap(int *x, int *y)
{
int tmp = *x;
*x = y;
*y = tmp;
}
/* now we have to change the call to `swap` */
main()
{
int a = 12, b = 17;
swap(&a, &b);
}
- you cannot write
swap
in Java, because you cannot manipulate pointers - allowing pointer manipulation opens the door to a lot of bad things
struct point * newOrigin() {
struct point pt;
pt.x = foo; pt.y = bar;
return &pt; /* BOOM, security hole */
}
-
never access the memory location of a local unless you’re doing something like swap, because those locals disappear when the function terminates, and now you’re treading in mirky waters
-
arrays and pointers
int a[10];
/* lookin good */
a[0] = 1;
/* still good */
a[1] = 4;
/* uh oh... */
a[29] = 6;
- this is all valid C code
- array bracket notation is actually syntactic sugar for pointer notation
- the following code is equivalent to the previous code
int a[10];
*(&a + 0*sizeof(int)) = 1;
*(&a + 1*sizeof(int)) = 4;
*(&a + 29*sizeof(int)) = 6;
int
sum(int a[], int n)
{
int s = 0, i;
for(i = 0; i < n; i++)
s += a[i];
return s;
}
/* equivalent */
int
sum(int *a, int n)
{
int s = 0, i;
for(i = 0; i < n; i++)
s += *(a + i*sizeof(int));
return s;
}
- pointers, arrays, and strings
- no such thing as a string type
- instead, we have the convention that a string is a
char
array, terminated by a null byte, (0
or'\0'
for emphasis) "hello" == { 'h', 'e', 'l', 'l', 'o', '\0' }
int contains(char x, const char *s)
- in Java we might write
boolean contains(char x, String s) {
for (int i = 0; i < s.length; s++) {
if (s.charAt(i) == x)
return true;
}
return false;
}
- in C we don’t know the length ahead of time, so we would have to do something like
int
contains(char x, const char *s)
{
int i;
for(i = 0; s[i] != 0; i++) {
if (s[i] == x)
return 1;
}
return 0;
}
- or with pointers we could do
int
contains(char x, const char *s)
{
while (*s) {
if (*s++ == x)
return 1;
}
return 0;
}
- string length
assert strlen("hello") == 5;
int
strlen(char *s)
{
int i;
for (i = 0; *s++; i++);
return i;
}
int
strlen(char *s)
{
int i;
while (*s++)
i++;
return i;
}
- standard libraries
#include <std.h>
#include <stdio.h>
#include <stdib.h>
#include <unistd.h>
#include <string.h>
#include "myheader.h"
<string.h>
char * strsave(char *s)
, returns a new copy of a string
char *
strsave(char *s)
{
int n = strlen(s);
char *p = (char *) malloc(sizeof(char) * (n+1));
char *t = p;
while (*t++ = *s++);
return p;
}
- somewhere down the line, you probably won’t need
p
anymore
char *p = strsave(s);
/* ... */
free(p);
strcpy(char *src, char *dst)
, does the same thing asstrsave
, but using a provided destination string, instead of one it creates- variadic functions
varargs.h
void printf(formatter, ...)
- unions
union intOrFloat { int i; float f; }
intOrFloat x;
x.f = 1.0;
x.i = 3;
- bit fields
- great when they work, but useless in a multithreading context
struct
deviceRegister
{
int dmaMode : 2;
int status : 3;
}
- typedefs
typedef int pid_t; /* `_t` means "type that is not a pointer", by convention */
typedef unsigned int uint32; /* in <unistd.h> */
typedef struct cell * cell_p; /* `_p` means "pointer type", by convention */
- macros
static final int STRING_CAP = 256
#define STRING_CAP 256
/* what you see */
void f() {
char line[STRING_CAP];
}
/* what the compiler sees */
void f() {
char line[256];
}
#define BEGIN {
#define END }
void f() BEGIN
char line[STRING_CAP];
END
int x, y;
int z = x > y ? x : y;
#define MAX(X, Y) X > Y ? X : Y
int x, y;
int z = MAX(x, y);
/* expands to */
int z = x > y ? x : y;
/* but */
int z = MAX(x++, y++);
/* expands to */
int z = x++ > y++ ? x++ : y++;
/* which increments one variable twice, and the other once */
#define MAX(X, Y) do { _typeof(X) a=X; b=Y; a > b ? a : b } while(0)
/** defines local variables `a` and `b`, to avoid the whole `x++` issue,
* but now we have semicolons, as well as extra variables `a` and `b`,
* so we need to put it all in curly braces. Sometimes plain curly braces
* don't work, but putting it in a `do {} while(0)` always works.
**/
/* one last thing */
#define MAX(X, Y) (X) > (Y) ? (X) : (Y)
/* put parens around everything in the expansion, to avoid clashes like */
2+MAX(1,2);
/* which would expand to */
2+1 > 2 ? 1 : 2;
/* when we really want */
2+(1 > 2 ? 1 : 2);
#define LINUX
#ifdef (LINUX)
/* linux only stuff */
#else
/* non-linux only stuff */
#error
#endif
# you can run the C preprocessor on any language
$ gcc -E prog.java
# there's a standalone tool just for that
$ m4 prog.java
Lisp
- two branches of programming languages
- bottom-up
- start with the machine, work a language out of that
- C family of languages
- top-down
- start with a model, build a language from that, and finally implement it
- Lisp, Haskell, Prolog
- bottom-up
- early languages
- Lambda Calculus
- two main operations
- abstraction
(λx.f(x))
- application
(λx.f(x))y
- abstraction
(λx.x)y => y
identity function- need some primitive operations in order to do anything other than
the identity function, so let’s assume we have
+
andif
defined (λx.λy.y+x)ab => a+b
- two main operations
- Lisp
> (defun id (x) x) ; abstraction form
ID
> (id 2) ; application form
2
;; both are S-expressions
> (defun twice (x) (+ x x))
TWICE
> (twice 2)
4
> (defun add (x y) (+ x y))
ADD
> (add 4 2)
6
> (defun max (x y)
(if (> x y)
x
y))
MAX
> (max 1 2)
2
> (max 2 1)
2
- who needs loops?
> (defun sum-of (n)
(cond
((= n 0) 0)
(T (+ n (sum-of (1- n))))))
SUM-OF
> (sum-of 5)
15
- list processing
'(1 2)
- list of1
and2
'()
- empty list or()
orNIL
'(1)
- list of1
(defun count (l)
(if (null l)
0
(1+ (count (rest l)))))
REPL
- read
- eval
- loop
(defun eval (exp)
(cond
((is-self-evaluating exp) exp)
((eq (car exp) 'quote) (cdr exp))
((is-a-function (car exp)) (eval (subst (body (car exp))
(cdr exp))))
((is-car exp) (first exp))
((is-cdr exp) (rest exp))
((is-cond exp) (for-each exp eval))
(T (ERROR))))
Side Effects
(setq glob 0)
(defun f (x)
(progn
(setq glob x) ; mortal sin
(print x) ; the only reason you should ever need progn
(+ x 1)))
; can mutate freely
(setq glob 0)
; cannot mutate (implementation defined)
(defconstant BUFSIZE 1024)
; can only set once (implementation defined)
(defparameter retries 2)
setq
- in the original implementation of lisp
setf
- if the var is global:
setq
- if the var is the car of a list:
set-car
- if the var is in a let: i dunno, set it anyway?
- if the var is global:
Locals
(defun sq1 (x)
"Squares x+1"
(* (+ x 1)
(+ x 1)))
; looks inelegant, so lets bind (+ x 1) to something
(defun sq1* (x)
"Squares x+1"
(let ((y (+ x 1)))
(* y y)))
; let literally expands to the following (minus the naming and comments)
(defun sq1** (x)
"Squares x+1"
(sq** (+ x 1)))
(defun sq** (x)
"Squares x"
(* x x))
Symbols, Bindings, Scopes
- symbols
- identifiers
- strings
- names
- bindings
- a mapping from symbol to value or storage location
- in java we have
- locals on the stack
- static bindings
- can’t read dl’s handwriting (there’s a dynamic in there)
- in lisp we have
- either pointers to storage locations
- or functions of no arguments which return a specific value
- they’re almost the same
- lifetimes
- scoped (death outside the scope)
- static (permanent)
- unbounded (garbage collected)
- managed
/* legal java */
class a {
static int a;
int a(int b) {
int a = b;
return a(a) + a.a;
}
}
(setq a 2) ; doing the devil's work here
(defun f (a)
(setq a 3))
(f a) => 3
a => 2
- this doesn’t work as it does because of dynamic scoping, but because of rebinding
Aliasing
class Point {
int x, y;
Point(int x, int y) {
this.x = x;
this.y = y;
}
/* nothing to see here */
static Point addPoints(Point a, Point b) {
return new Point(a.x + b.x,
a.y + b.y);
}
/* here's where things get weird */
static void addPoints(Point a, Point b, Point dest) {
dest.x = a.x;
dest.x += b.x;
dest.y = a.y;
dest.y += b.y;
}
}
class Main {
public static void main(String[] args) {
Point p = new Point(1,2);
Point.addPoints(p, p, p);
assert p.equals(new Point(2,4)); // addition works here
}
}
class Matrix {
static void multiply(Matrix a, Matrix b, Matrix dest) {
/* ... */
}
}
class Main {
public static void main(String[] args) {
Matrix m = /* ... */;
Matrix.multiply(m, m, m); /* will make garbage */
}
}
Hygenic macros
#define SWAP(X,Y) typeof(x) t = (x); x = (y); y = (t)
main () {
int a = 2, b = 3;
SWAP(a,b); /* fine */
/* --- */
int a = 2, t = 4;
SWAP(a,t); /* ERROR: t already defined */
}
- we need our macros to use identifiers which are not used elsewhere
- lisp has
gensym
- lisp has
(defmacro head (l)
`(car ,l))
- (`) backquote / backtick
- (
,
) unquote
;; (omit 2 '(1 2 3)) => (1 3)
(defun omit (elem lst)
(when lst
(let ((head (car lst)))
(if (eq elem head)
(omit elem (cdr lst))
(cons head (omit (cdr lst)))))))
- tail recursion
;; non-tail recursive
(defun fact (x)
(if (zerop x)
1
(* x (fact (1- x)))))
;; tail recursive
(defun fact* (x)
(fact-iter* x 1))
(defun fact-iter* (x sofar)
(if (zerop x)
sofar
(fact (1- x) (* x sofar))))
;; also tail recursive
(defun fact** (x)
(fact-iter x 1 1))
(defun fact-iter** (bound count sofar)
(if (= bound count)
sofar
(fact-iter** bound (1+ count) (* count sofar))))
(defn fact [x]
(loop [x x, sofar 1]
(if (zero? x)
sofar
(recur (dec x) (* x sofar)))))
Types
what are they?
ideas:
- description of a location
- tells you what you can do
- permanent attribute?
- preconditions and post conditions
;; LISP ;;
(* 1 "hi mom") ;; run-time error ;;
/* JAVA */
1 * "hi mom" /* compile-time error */
''' PYTHON '''
1 * "hi mom" # evaluates to "hi mom"
Generic Types
class Stack<T> {
void push(T x) {}
T pop() {}
}
/* ... */
class Person {}
class Student extends Person {}
class Boss extends Person {}
static void main() {
Stack<Person> s = new Stack<Person>();
s.push(new Person());
s.push(new Student());
/* everything was good up till here */
Student p = s.pop();
}
/* to allow for what we did above, java lets you do */
void push(<? extends T> x) {}
In scala, it’s even simpler
class Stack[T] {
/* +T is the same as <? extends T> */
def push(x: +T): Unit = {}
}
Java also has <? super T>
, which is written as [-T]
in Scala.
These two things are called covariance and contravariance
trait Function[-A, +R] {
def apply(x: A): R = { /* ... */ }
}
/* for instance */
val f: Function[Number, Long]
/* takes any number, but always returns a long */
Declarative Languages
“what, not how”
- database query languages
- SQL
select * from people
where name = smith
- logic programming
- “proofs = programs”
- languages
- prolog
- datalog (SQL + prolog)
a
is married to b
;
a
is the parent of c
, e
, and f
;
c
is the parent of d
.
wed(a, b).
par(a, c).
par(a, e).
par(a, f).
m(b).
f(a).
f(c).
mother(X, Y) :- par(X, Y), f(X).
married(X, Y) :- wed(X, Y).
married(X, Y) :- wed(Y, X).
parent(X, Y) :- par(X, Y).
parent(X, Y) :- married(X, Z), par(Z, Y).
grandparent(X, Y) :- parent(X, Z), parent(Z, Y).
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).
sibling(X, Y) :- parent(Z, X), parent(Z, Y).
sister(X, Y) :- sibling(X, Y), f(X).
brother(X, Y) :- sibling(X, Y), m(X).
aunt(X, Y) :- parent(Z, Y), sister(X, Z).
uncle(X, Y) :- parent(Z, Y), brother(X, Z).
cousin(X, Y) :- grandparent(Z, X), grandparent(Z, Y).
Curry–Howard Isomorphism
- Church–Turing Thesis
- computable functions
- Curry–Howard Isomorphism
- relation between types and logic
- “The typed lambda calculus is just logic in disguise” – D.L.
- propositions are types
- type checking is proof
- proofs are programs
State
- procedural
- variables
- functional
- propogate values
- relational/logic
- prove properties of values
How does IO map to these ideas?
- procedural
- read as variables, write as variables
- functional
- a stream of values
- monads
- still true functions, because they have a closure of the history of values
- relational/logic
- Prolog basically says “here’s the IO module, it’s really there for debugging so do whatever”
- Oz is a logic programming language which also has objects
Encapsulation
- insides vs outsides
- the entire idea behind an API
- why?
- control
- creates two classes of programmers
- component authors
- component users
- creates two classes of programmers
- specs/standardization
- control
- ADT (abstract data type)
- you have values and operations
- closed under the operations
- classic example is complex numbers
- real and imaginary parts
- arithmetic operations
- objects/classes
- in a sense, a superset of ADTs
- have both imports and exports
- if your
Shape
class has aPoint
field, it is depending on that externalPoint
class
- if your
- configuration
- if your class has a
Set
field, do you use aHashSet
or aTreeSet
? - decisions, decisions
- factories and polymorphism exist to delay these decisions
- if your class has a
- can you have objects without classes?
- the answer is yes
- the first language to support this idea was Self
- strategies
- objects + interfaces + factories
- Emerald
- distributed language
- every object was on a different computer
- classes didn’t really make sense in this context
- Emerald
- objects + clone + add feature (prototype)
- JavaScript
- objects + interfaces + factories
Scripting, Dynamic, and Domain-Specific Languages
Scripting Languages
- python, ruby, javascript, php, bash/sh
- characteristics
- interpreters
- declarative overhead
- dynamically typed
- inferred types
- optionally typed
- Dart
- Cosmos (brand new)
- Swift (laaaame)
- sugar
- syntax vs APIs
- entirely about human factors
Domain-Specific Languages
- vectors/matrices
- APL (a programming language)
A ← B × C
- web pages
- php
<li select * from database>
- UI
- javascript
onMouseOver( /* ... */ )
- gains
- expressiveness
- smaller learning curve
- losses
- modularity
- encapsulation
- scalability
- performance