Научная статья на тему 'LOCALIZED LAMA GRADUAL TYPING'

LOCALIZED LAMA GRADUAL TYPING Текст научной статьи по специальности «Математика»

CC BY
48
11
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
PROGRAMMING LANGUAGES / GRADUAL TYPING / TYPE SAFETY / CAST CALCULUS

Аннотация научной статьи по математике, автор научной работы — Kryshtapovich V.S.

Gradual typing is a modern approach for combining benefits of static typing and dynamic typing. Although scientific research aim for soundness of type systems, many of languages intentionally make their type system unsound for speeding up performance. This paper describes an implementation of a dialect for Lama programming language that supports gradual typing with explicit annotation of dangerous parts of code. The target of current implementation is to grant type safety to programs while keeping their power of untyped expressiveness. This paper covers implementation issues and properties of created type system. Finally, some perspectives on improving precision and soundness of type system are discussed.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «LOCALIZED LAMA GRADUAL TYPING»

DOI: 10.15514/ISPRAS-2021-33(3)-5

Localized Lama Gradual Typing

V.S. Kryshtapovich, ORCID: 0000-0002-3941-6201 <[email protected]>

ITMO University, Kronverksky Pr. 49, bldg. A, St. Petersburg, 197101, Russia

Abstract. Gradual typing is a modem approach for combining benefits of static typing and dynamic typing. Although scientific research aim for soundness of type systems, many of languages intentionally make their type system unsound for speeding up performance. This paper describes an implementation of a dialect for Lama programming language that supports gradual typing with explicit annotation of dangerous parts of code. The target of current implementation is to grant type safety to programs while keeping their power of untyped expressiveness. This paper covers implementation issues and properties of created type system. Finally, some perspectives on improving precision and soundness of type system are discussed.

Keywords: programming languages; gradual typing; type safety; cast calculus

For citation: Kryshtapovich V.S. Localized Lama Gradual Typing. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 3, 2021, pp. 61-76. DOI: 10.15514/ISPRAS-2021-33(3)-5.

Локализованное применение частичной типизации

В.С. Крыштапович, ORCID: 0000-0002-3941-6201 <[email protected]>

Университет ИТМО 197101, г. Санкт-Петербург, Кронверкский проспект, д.49

Аннотация. Частичная типизация - это современный подход для сочетания преимуществ статической и динамической типизации. Но несмотря на то, что научные исследования направлены на корректность систем типов, многие языки намеренно делают систему некорректной для ускорения производительности. Данная работа посвящена реализации диалекта языка Лама, который поддерживает частичную типизацию для явно указанных участков кода. Целью реализации является сочетание двух подходов: обеспечение типобезопасности в одних участках кода и производительность языка в других участках кода. Статья раскрывает детали реализации и свойства полученной системы типов. Также рассматриваются способы улучшения полноты и корректности полученной системы типов.

Ключевые слова: языки программирования; частичная типизация; системы типов; исчисление преобразований

Для цитирования: Крыштапович В.С. Локализованное применение частичной типизации. Труды ИСП РАН, том 33, вып. 3, 2021 г., стр. 61-76 (на английском языке). DOI: 10.15514/ISPRAS-2021-33(3)-5.

1. Introduction

There are different approaches of type system implementation. Static type systems are well-known for preventing many undesired behaviors of the program at compile time by reasoning about possible values that expression may or may not take (e.g., Java, Haskell, ...). On the opposite side, dynamic type systems are well-known to be the most flexible type systems - low compilation prerequisites and delegation type safety to runtime allows rapid development and prototyping (e.g., Python, Racket, ...).

There is a combination of both mentioned approaches named «Gradual Typing». This tec hnique of program typing drained a lot of attention since the article of Siek and Taha [1] was published. Article presents sound type system for Lisp dialect which represents partially typed functional language. The presence of sound system for this model language gave rise to lots of research in this field. But practical application of sound gradual systems is still questionable because of the performance issues [2]. The key purpose of this article is to see how gradual typing and explicit unsafe code annotations can be integrated with each other as native language syntax. The desired result is to acquire language that allows programmer to control trade-off between performance and type safety. The Lama [3] version 1.00 will be used as our target language of research. Let us imagine typical Python code, and most probably it would be some untyped piece of code. Surprisingly or not, only 3.8% of repositories have type annotations by 2020 year [4]. But the idea of gradual typing is powerful: let programmers add static type information expression by expression in the code. Thus, we can step-by-step convert untyped code into fully statically typed code with corresponding static guarantees.

This is so called gradual typing: on the one hand we have power of static annotations preventing us from misusing functions, modules and preserving contracts. On the other hand, we shut down static type system whenever we choke down with abyss of static type errors.

The most important result of original article [1] was soundness of gradual type system. This was reached by exploiting cast calculus and rewriting original program with casts. The cast can be imagined as the bridge that value surpass during runtime from untyped part of code to typed part of code. This kind of "bridge" is annotated with static type and value should conform to it while moving from less typed part of code to more typed part of code. So, the main idea is to correctly insert casts and yield a program with soundness property.

1) If program does not typecheck, the program execution path may stuck with static type error

emerged at runtime. (If there is a possibility to launch untyped programms at all)

2) If program typechecks, it can produce only dynamic type error or cast errors. No errors involving

incompatibility of static types may occur at runtime. In other words, if program is accepted by sound typechecker it can never fail contracts that was given to expression by the programmers in the form of types. For instance, you cannot acquire string value in variable statically typed as integer.

Gradual typing has been presented in several languages and in various forms, such as:

1) Python [5, 6] (MyPy [7] and PyType [8] projects);

2) Typed Racket [9];

3) JavaScript: TypeScript;

4) C#4.0 with dynamic keyword.

Although they are all have gradual typing property (in the sense, that not all objects have known type at compile time), their implementation of gradual type system has strong differences. Some of them are compiled into dynamic target language, such as TypeScript program is converted to pure JavaScript after compilation. Some of them are static by the nature as C# and then bring up a «dynamic» keyword which marks that object has unknown type until runtime. Some of them incorporate optional typing annotations and leave them alone for documentation and external tools (linters, typecheckers, IDE) as Python do.

The most noticeable state-of-the-art of gradual typing: every industrial-level language doesn't care much about soundness of the type system. This is because of the performance issues. Some real programs exhibit slowdown over 20 x, likely rendering them unusable for their actual purpose. To increase performance many of them reduce number of dynamic casts or remove them at all. This leads to trade-off between soundness and performance of gradually typed language. To sum up, gradual typing provides mechanism to check program correctness having this pros and cons:

• Types can be added ad hoc by the programmers. 62

• Gradual type system can be sound in certain languages (more frequently academic ones).

• Dynamic typechecks is giving significant overhead at runtime.

No doubt: looking at the diversity of implementation and approaches it is interesting to look at the result of implementation of gradual typing in the language with different model of computation and semantics. We will test some new syntax conceptions experimenting with Lama programming language.

Xa Ma is a programming language developed by JetBrains Research for educational purposes as an exemplary language to introduce the domain of programming languages, compilers and tools [3]. The most noticeable property of this language that it is fundamentally untyped. The reference manual says that the lack of a type system is an intentional decision which allows to show the unchained diversity of runtime behaviors. But at the same time manual says that the language can be used in future as a raw substrate to apply various ways of software verification (including type systems) on [10]. So why wouldn't we try to implement some kind of type system upon it? In our work we will test new approach of combining parts of code where different rules of static verification are applied: some parts of code will be gradually typed, and some parts of code will be left untyped. The expected result is programming language that can mix two types of code:

• with semantics that respects type safety in necessary parts of the code (e.g., sound);

• with original semantics without overheads.

This should allow programmer to choose what parts of program should be gradually typed, and what parts of program should not be typed.

Another expected result is producing a program with decreasing speed of execution of gradually typed code. The slowdown may be arbitrary, but we will try to reproduce results from article (at least x 2 slowdown).

2. Examples

To give reader a proof of concept we should consider concrete syntax and pragmatics of the pieces of code written in Lama and describe how to introduce types into our language and what they expected to do. Normally, code in Lama looks as follows. No types, just anarchy of undefined behaviors:

fun closure(x) { fun (y) { 2*x*y

}

}

In this example we see function that takes x as an argument and returns function that multiplies input argument by 2 * x. One expects it to be used upon integers, but Lama won't restrict to call function like closure ("Hello, ") ("world!") and pray for runtime not to fall. We can use type annotations to designate our intentions about the code like so:

fun closure(x :: Int) :: Int -> Int { fun (y :: Int) :: Int { 2*x*y

}

}

What do we expect from introduced type annotations?

• Backward compatibility with existing untyped source code;

• Static compile-time checks;

• Dynamic runtime checks.

Moreover, we would like something like type inference.

fun closure(x :: Int) { fun (y :: Int) {

2*x*y

}

}

If x and y have known at compile time types, then type of the functions can be inferred: inner function has type Int -> Int, and outer function has type Int -> Int -> Int. Moreover, Lama nowadays supports operations only with integer constants (Int). If we take a closer look to the untyped example, it can be inferred that x should have type Int, y should have type Int, because they are used in expression like 2 * x * y, and further infer function types, which makes this concrete piece of code fully typed.

At first glance type inference seems to be contradictory with backward compatibility. That is because some of the untyped expressions become implicitly typed, as first example do. Thus, runtime typechecks are inserted in parts of code that were initially untyped, which affects their semantics. Thankfully, the developers of Lama left regression tests that check backward compatibility. So we can bring up type inference features with awareness on backward compatibility. Another example of typing Lama programs is pattern matching fun processA(a) { case a of

A (0) -> "1" | A (x) -> "2" esac

}

The A(0) notation is so called S-expression [11]. Quick Lama-specific introduction: you can consider S-expression as labeled array of arbitrary values. Name should be capitalized, number of values is not bounded. Two S-expression labels are considered equal in Lama if their five first letters are the same, so Branch(Leaf, Leaf, 3) and Branc(Leaf, Leaf, 3) are equal Sexpressions. By the way, Leaf is nested S-expression with zero values in it, so brackets are optional for zero-arity S-expressions.

Side note: S-exprs like Int and Str has type Int :: Int() and Str :: Str () to distinguish them from integers (3 :: Int) and strings ("smoothie" :: Str) type. Back to our processA function, we can see, that if a matches A(0), then "1" produced, for other value A(smth) where smth is not 0 we would get "2" produced by the function. If we call processA(B(0)) we would get runtime error from pattern matching. So, other things that we would like from our type system are:

• Check that all branches cover matching expressions. E.g. no runtime error would occur in pattern matching.

• Check branches that would never succeed: either covered by previous branch or just don't conform to matching expression.

For example, type system should reject this Lama program:

local foo = fun (x :: A(Int)) { case x of

A (0) -> "1" | A (x, y) -> "3" anything

esac };

Here type system can check two things. First of all, x = A(1) won't meet any branch, so not whole possible values of x are covered. And the second: A(x, y) would never match values with type x

:: A(Int) .

Also note, that functions in Lama has beautiful sugar that combines pattern matching, that can be used to check input arguments:

public fun id2 (Abc (x, y)) :: ? { x

}

write(id2(Abc(6, 8)));

write(id2(Xyz(6, 8))); -- static fail

The last example that we should consider relates to runtime checks. Let's look at this simple piece of code:

fun intStringer(x :: Int) { x.string

}

local dyn :: ? = "Can be anything"; dyn := intStringer; -- forget type dyn("input") — should it fail?

At first glance it is unclear, where is the problem, because dyn("input") would reduce to "input".string and then to "input". Do we actually care about function, that originally takes Int and store it at runtime? The answer is yes:

fun intStringer(x :: Int) { (x + 1).string

}

Of course, if we try to reduce dyn("input") we get "input" + 1, and then we will now end up with runtime error of casting "input" to Int. But what is the real cause of this error, whom to blame [12] [13] [14] for this mess - a plus operator, or input to the intStringer? That is why we should check function arguments wrapping them with appropriate dynamic casts. So, if follow blame ideology in both implementations dyn("input") would fail with the same reason: function expected Int, but given Str. But this solution could lead to extra checks and execution speed decrease.

After seeing quite a bit of examples we conclude that these features would be handful in untyped Lama language. Typechecker would decrease number of errors in code made by programmers and runtime casts would inform programmer when untyped code does not conform contracts of the typed code. In next section we will define syntax of gradual types and their semantics.

3. Type Annotations Definition and Semantics

Gradual typing assumes that user annotates parts of the program with certain type. So, we should provide this feature in Lama compiler. Syntax rules have been described in Lama specification. We will fix them a little bit, because we only change variable definition (global and scope), function definition and their input parameters, look at p. 10 [10] for more detailed language syntax specification.

We slightly modified this nonterminals on the fig. 1: just put static type annotations to variable definition and function definition. Also, nonterminal functionArguments was slightly changed in comparison to specification to respect pattern matching sugar. This sugar is not included in concrete syntax definition for some reason. Other nonterminals assumed taken from section "Concrete syntax and semantics" of specification [10].

The definition of type annotations typeExpression is presented on the fig. 2. It semantic (see r in fig. 3) is almost straightforward: syntax rule typeAny corresponds to dynamic type TAny, which can hold arbitrary value. Syntax rule typeArray corresponds to the array TArr of certain type. Syntax rule typeSexp corresponds to TSexp with parsed UIDENT as the name of S-expression and list of types forming type of S-expression. Syntax rule typeArrow corresponds to arrow TLambda. Note that input arguments can vary from zero to arbitrary amount. Syntax rule typeUnion corresponds to TUnion and lists all types that value can conform.

va riableDefinilioiiJ te m JunctionDefinition

function A rgumentS fiinArgltem

L1DENT [ :: type Expression ] [ = basicExpresswn ] [ public ] fun LI DENT

( futiclinnArguments ) functionBody [ :: typeExpression \ [fiinArgltem ( , funArgltem )* ] simptePattem [ :: typeExpression J

Fig. 1. Syntax extension: scope expressions with type annotations

typeExpression

type Any typeArray typeSexp lypeArrow

type Un ion typeListO. typeListO;

typeUnion type Arrow typeSexp type Army type Any

( typeParser ) ?

| typeExpression \ U!DENT[ ( typeListO. ) \ typeExpression —» typeExpression ( typeListO. ) —> typeExpression Union [ typeListO: ] [typeExpression ( , typeExpression ) ' ] [typeExpression( ; typeExpression)*]

Fig. 2. Typing expression syntax

Fig. 3. Typing expression semantics Only typeSexp rule with zero arity has non straightforward semantics. If type parameters of Sexpression type are not presented, and UIDENT is one of the

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

• Int - corresponds to integers t = TInteger;

• Str - corresponds to strings t = TString;

• Void - corresponds to empty set of values t = TVoid;

• otherwise, it corresponds to S-expression with specified name and no arguments.

If typeSexp is specified with brackets, it has straightforward semantics of S-expression. So, for example, Cons and Cons() has the same semantics of TSexp("Cons"), but semantics of Int and Int() are different as integer and S-expression types: TConst and TSexp("Int") correspondingly.

4. Typechecking Rules

The typechecking is inserted in the compilation pipeline directly after AST (Abstract Syntax Tree) representation of the program has been built (see "src/Language.ml" and "src/Driver.ml" in Lama source code [3]). The typechecking simultaneously performs the following procedures with AST: type checking, type inference and cast insertion.

For detailed description of this three type system problems we need to describe such classes as expressions, values, patterns and types of the language.

• r is class of type expressions (see fig. 3);

• e is class of expressions (see fig. 4);

• v is class of values (see fig. 5);

• p is class of patterns (see fig. 6).

| Var(i) | Ref{s) | bast^r) | Binop(i,e,e)

| Repeat(e.e) | Case(e. (p, e)) | Return(fi) J Ignore(^) | Scape((ii,i'}1fJ) | Lanbda((ii) r), i.11, t) Fig. 4. Lama expression class

| VFunRef(s,i,e.i) | VBuiltiflj^j | VCast(i/, r)

Fig. 5. Lama value class p-.= FWildcard | FConst(i) | PString(s)

| PSexpTag | PArrayTag | PClosureTag Fig. 6. Lama pattern class

There is also additional classes that are built-in of implementation language (OCaml). They can be considered as value class:

• i - integer;

• s - string.

Let us denote set of variables by V, which represented by OCaml string s, and set of types T. We should think about T wider, that types induced by type constructors of fig. 3. In other words, some type y £ T may not be expressed with type constructors.

If we simplify process of compilation a little bit and ignore external symbol resolvance, Lama parser generates expression of e class without Cast constructors, i.e. pure untyped Lama expression. Notice, that expression can also contain patterns p due to pattern matching in Case expression. Then, we have some options how to deal with generated AST. The trivial option is to left expression untouched and get the semantics of classic Lama language. The first option is trying to statically typecheck expression. If we succeed to acquire static type of program represented as whole expression, we can conclude that there is no static misuse of typed expressions. The second option is to transform AST to insert casts where values are passing from untyped parts of code to typed one. We will build up an algorithm that makes static typechecking and dynamic cast insertion simultaneously. For type checking we need to answer a question: does some type r_l £ T conforms to other type r_2 £ T? That answer is given by ~ relationship named "conforms" which is constructed by axioms presented at fig. 7.

We should put additional attention to TUnion type and its rules. It denotes type that holds all possible values which can hold its constituent types. It is naturally coming from such language expressions as If, Case and Return. We have chosen set-theoretic approach on typing such expressions. Although there is an algorithm for union contraction, set-theoretic approach for type combination may lead to certain drawback in correctness and decreased performance during compile time.

Speaking about correctness: rules ConfTUnion1 and ConfTUnion2 generally cannot proof that two type representation conform to each other if they really do. Thus, the lack of completeness is

reflected in false positives generated by static typechecker. That means correct type-annotated Lama expressions can be rejected by typechecker with such relationship definition This is a common illness of every static typechecker because we would like to check nontrivial property of the code: to be statically correct [15].

[ConfTAny] [ConfTAny2]

TAny ~ 1

t t'

TArr(r) ~ TArr(r') t ~ T'

— [ConfTArr]

— iConfTRef]

TRef (r) -- TRef(r')

* = s' A [AIU r, - 7",']

-f—,-— [ConfTSexp]

TSexpt»,^ .. .Tn) ~ TSexp(»',T^ ■■ .T'n)

ECJKlST Lambda] _ik ~ ^ A [A"=1 rj - r,]_

TLarobdafn .. . t„,t,.) TLambda(r(.. .r',, rj)

A"=i * - 'f

TUnion(n . .. Th) ~ T

YU t ~ <

- [ConfTUnion1]

t ~ TUnion(r[ , ., <)

T = T

— [ConfTUrion2]

- [ConfTGround]

Fig. 7. Rules of conformance to the other type But the good news is that no type intersections TIntersection or type subtractions TSubstraction are coming - we try to avoid them when building type system for Lama. Now we can make an analogy of ~ relation for expression e and type r. But instead we will be inferring type of expression. To start with something simple let's define type inference for patterns (see fig. 8).

Notice, that we infer both lower and upper bound for pattern type. This interval style inference of patterns is crucial for analyzing case expressions. Let's denote r,(p) £ T for lower bound inferred type for pattern and rr(p) £ T for upper bound inferred type for pattern. Notation r(p) means theoretic set of all possible values that are captured by pattern p. With the chosen type constructors and their semantics we can conclude:

• rr is representing type that covers all possible values captured by pattern (upper bound);

• t, is representing type that is covered by all possible values captured by pattern (lower bound). For example, value Suc(1) has type TSexp("Suc", TConst), but this value alone covers almost nothing, so TVoid c {Suc(1)} c TSexp("Suc", TConst).

Now we are ready to describe our main part of algorithm: type inference and cast insertion for Lama expressions. We will use such notation: e ^ e': r. That means that expression e has type r, and cast insertion into that expression produces expression e', which has the same type r. In addition, we have two types of contexts: T: V ^ T for typing context of variables (which assigns types to variable typenames) and set of types icT for collecting information about function return type. Then, typechecker by given context and collected return types produce another collection of return types (probably, bigger than the original), expression rewritten with casts and it's type. So, the full notation of this algorithm should be:

f.iheB^h^: t.

±Лпу С т(РК± id card) г TArjy Г, И т(р) С Тг

[InfecPWiidcard; [InfeePNamed]

Г[ С. r(PNamed | з , p) ) ГТ Tr

[ InferPConst]

TVuiid с. r( PCons t(t)) С TConst

IlnferPStringf

TVoid С T"(PString(s)) С TString

- [InferPUnfcoxed]

TConst С r(PIJnboxed) С TCOnst

[InferPStrTag]

TString г rfPStringTag) С TString-

- ■ inferPSexcTag]

TVoid _ r(F SexpTaq) С ГА ;у

iInferPClosureTag]

TVoid С r(PClosureTag) С TAny

[InferPBcxed]

TUnion(TString, TArr(TAny)) с 7"(PBo>ied) С TAny [InlerPArrTag] TArr (TAny) С rfPStringTag) г TArr(TAny)

[InferPSexp] Tt с rip,) I:

TSexpfs.ri . ..t>J С т(РНехр(».р1 ■ ■ -JO) С TSexp(s. т[ ...т^)

[InferPSrray] Ч С Tin) С г'

ТА г г (т Uni on (rj . . ,г„)) С r(FArr(pj . ..pnjj't TArr(TUnion(r| , .. т"„))

Fig. 8. Rules of lower and upper bound type inference for patterns Fig. 9 and 10 presenting all set of rules for type inference of Lama expression with r,iheHi'h e': r notation used. Let us highlight some features about presented algorithm. The set of return types for expression A is initialized with 0. Note, that initial context r maps every variable occurrence to type TAny. The typechecker does not check, is symbol is defined in upper scopes or correctly imported, but context is called to provide correct surrounding type information for expressions.

Notation r £ (TSexp,TString,...} in rule [InferLength] means that r's top level constructor should be one of the listed in angle brackets.

In rule InferCall cast to TAny is optional. It is used in inference rules to be consistent with InferCall3 rule which process call of the union type object.

Many of the rules can be simplified by removing A because they do not change it, such as InferArr and InferSexp, et cetera. That is because they recompute A for expressions that never change A in correct Lama expressions. There are a few places where A is useful: it is InferLambda, InferReturn1 and InferReturn2 rules. Notice, that we are inferring return type of the function just to acknowledge that it fits type declared by the user, the declared interface is not changing. But if the type is not specified by user, the inferred type for variable will be used implicitly.

Also notice rules in Infe rCa se. First, we collect return types from the branches while dragging A through the computation pipeline. The second one, look at notation ruir (p_i) - it fulfills typing context with mapping of PNamed named pattern to its types. The t> can be defined via rr as follows:

Kryshtapovich V.S. Localized Lama Gradual Typing. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 3, 2021, pp. 61-76.

Tr(p') U (s: Tr(p)} p = PNamed(s,p')

Tr(p) =

Q^cPi)

i=1 n

Í=1

p = PArr(pi.....Pn)

p = PSexp(s,pi.....pn)

otherwise

r,A h Const (¿J HfAh Const(>) : TConst

r, A y String (s) 1-4 A I- String(s) ; TString A0 := A r, h et hlv^hej : T, 1'. A h Arric] ... e„) A„ b Arr(e^): TArriTUnionffl)) Au := A r,A, 1 h e. H. A, h ej.tTj

Sexp(j.-.ci .. .c„) i-f A„ h Sexp(e, ej) : TSexp(s,Ti)

Ti»! = t

[InferC [InferStrino]

LInfer^rr]

[InferSexp]

r. A h var(s) >-> A I- vat(s) : r

_rw - T-_

[InferVar] [InferRef]

T, A I- Ref(s) >-:■ A I- Ref (ir) : TRef(r) r. A I- r'i n- Aj I- r\ :t, n ~ TConst T, A| hi^HjA;!-^: r-j T2 Si TConst 1'. A - Binop(3.e[. i-> Am h Binopfs. Cast(c;, TCor.st ),Cas-(fj. TConst)): TCons::

1'. A I- f| M Ai h ci : TAny l\ A| I- e2 n Ai h c'2 : r2 t2 — TConst

LT nf erBiníip J

r, AI-Eleni(eiie2)'-4 A^l- Elejn(iLr|,Cast(t^.TCon5t)) : TAny

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

T.A h ej !-)■ Ai I- : TStr r.Ai I-fj «ijl T! ^ TConst r, A H Blemffi,, ea) nAjh E lem (e',, Cast(e4,TConst)); TConst

r. A h e, >-+ A] 1- e^ : TAtr(-T|) r. A] hjHAjNjir; t^ ~ TConst !', A t- Elem(rii(Fj) n- As h Elem(c'| ¡Cast TConst)) . r,

I", A I- r i-t A i I- ii' : t re (TAny. TArr, TString. TSexp)

Lxrif^CHUftTTl]

[ InferElernof str]

[InterElemOfArrJ

I". A H Length(c) >-4 Ai I- Length(e') : TConat

í", A h n-s Aj b i?' -1 F, A i- StringVal(r) i-t Ai I- Strin#al(e') ; TSt rl ng

r, A h / -t An I- f ; TAny r,A¿ ,hjHÍi,e;:r¡

[InferLengch]

[InferCall]

i", A I- Calif/. hCast(Call(/'.rí;).TAny) : TAny

T. A I / An I - /' i TLa[tifcda(7i ...r) m ^ n r,A,-i h AT.c¡ :r: r, ^

[ i.nfercaJ ii:

r. A !- callf/.e,.. .sj A„ I- cast (Gal t{/',Ca$1 (oj.-y,)), t) : r

",AI- f^Ag^f : TUnionj-.i ... yL) riji I- Call( ') i-v Aj, Cast (CallQ/. EJ*). T,) : u

[". A I- Ca 11 (/, t-j ... i-> A„, I- Ca3t (Call(/"'.77"). TUnion(ñ)) : TUnio[i(j7)

I". A h r i-> A i I- r' : TAny I", Ai henA¡hí':r

[InferCaLlS;

I" A Assignor) hi h Assign(r'. ''i : TAny

r. A h r h- A, h r' : TRaf [p) r, A[ I- I; nAfhf'ir T. A h Agsignfr, r1) t-i- A | h Assign(rJ, fr) ■ p

[InlerAssi^nl]

r. Inf erAssigni 1

Fig. 9. Rules of type inference and cast insertion The third one about InferCase is that there is a check that all branches cover target type: m ~ TUnion(ri(pi)). And the fourth: notice that each pattern is checked for code execution availability Tr(Pi) ~ m, and at the same time we check that branch is not hidden by earlier branch rr(pj) ^ ri(p;). According to inequalities

^r(Pi) ~ T,(p;) ^ r(Pi) c rr(Pi) c T,(p;) c (p;.).

Г, Л h El н> Д| h el : | Г..Д1 h ja А; -¿¡-.т

Г,Д h Seq(ci,«a) hf fljj.h q{ei,r'2) : 7

■ . ---—---[jnfacsKip]

Г, Д i- skip нДН Skip : ivoid

Г,Д1-снД1Н|У:сг~ Tconst Г. Д1 h et н ij h с', \ r, F, Д3 h ty Да h e'j ; Т/ Г, Д h If(c,et, e;) h+ Д;! h I f(Ca st(c', TConst), n'^e'j) : nir.icnfa, r/)

Г.ДЬмД! \- d \ a ~ TConst Г.Д1 ЬечД21-е';г

[ infer i SJ

Г, Ah Whiie (с, e) Д2 h Whj.le(Cast(c', TConst), e') : TVoid

Г, A - r.:~> Д) he' : a ~ TConst Г.Д, h e 4 Да h e': r Г. Д h Repeat (r,e) Да h Repeat (i', Cast (г', TConst)) : TVoid

[InferWhilej

[In ferRepeat J

!'. Д h m и- Дг| h nf : ¿j

rUri-fel.Ai-i h с, Д; H e; : тг(рЛ ~ uj W"v ТШ1оп£т[({ЦЙ Vj' < i . тт(и)!* ^fe)

--[InferCasi?]

Г, Д h Case(m, . . . (j>„. e:„)) i-> Д„ h Case(m', (j>i, к!) .. . (jtnO) : TL'nion(ri)

[ InietReturnl!

Г. Д I- Return 4AU {TVoid} h Return : TVoid

Г, Д i- ей Af h с-': r Г, Д h Return [f j A' H{r} h Ret и rn(f') : TVoid

_t',Ai- Ch-г A'h f> : г_

Г, Д " 1дпоге(к) ь* Д' b Ignore(e') : TVoid

LlnfecReturn23

[Inferlgnore]

До != Д ru {{'foTi)Li}, h а Д, h с; XTi Г и {{«<, 7i)j~ib Да h с Н> Д' h с' : W

- ---- - — -- -:--г----— - [InHerScooe]

Г, Д h Scope((.i,.rj)i_,,fi) Sccpe{(e„ : л

Г U h с I-} Д',е' -.6 Т Union! Д' U {¿}} - т

--- j 1- [InferLambda]

Г, Д I- Lambda((яj, n,)i=] ,e, r) 1-+ Д, Lambda ([s,. n, )]=] ,r'. r) : TLa mb aa (7т7, r)

Fig. 10. Rules of type inference and cast insertion (part II). In other words, when expression holds, it is certain that pattern pj was covered by more recent cover p;. In that way we eliminated the need of introduction of intersection or difference types in our type system. But it doesn't mean we cannot deal with intersection and difference types, see [17] or [18] for example of polymorphic type system that handles that.

The most complex is [InferScope] rule. It is intentionally simplified, because it's implementations is more subtle. Here it simply overwrites variable or function definition and updates context r. But implementation also checks that previous usage is corresponding with current typing when no expression is provided to variable. But to describe that strictly we would need to introduce a class for declarations and this rule would get even more complex.

So, this rule lead to new language feature - type usage of expression inside the scope:

{

f :: Int -> Str; g :: Int -> Int; f(g(0)); -- ok f(g(D(0))) -- error

};

{

f :: D(Int) -> Str; g :: D(Int) -> D(Int); f(g(0)); -- error

f(g(D(0))); -- ok {

f :: [Int] -> Int; g :: Str -> [Int]; f(g("hello, world"))

}

};

Other type checking rules either trivial or common in corresponding field of study [16], so we wouldn't dive too deep into them. In next chapter we will discuss performance issues of our typechecking algorithm.

5. Cast Performance Analyzing

It is obvious that rules presented at fig.9 introduce new kind of expression C a st(e, r). It's runtime semantics is simple: when expression e evaluates to value v, we should check that value v corresponds to type r. If v conforms to e, the result of evaluation of C a st (e, r) is v, otherwise cast error 1 produced as the result.

Runtime check that value corresponds to some type may be time consumptive, especially when type and expression are complex and have big nestings. Thus, we can introduce and explicit syntax for parts of code where we wish not to insert casts like this:

fun mod(x :: ?, m :: ?) :: ? { #NoTypecheck {

(if x < 0 then 0-x else x fi) % m

}

}

Typechecker will see this annotation and completely ignore annotated part of code. The implementation of gradual typing for Lama offers us three options to maintain typechecking procedure:

• #NoTypecheck - drops AST from typechecking at all;

• #StaticTypecheck -disables cast insertion into AST, but static checks are still enabled;

• #GradualTyping -enables cast insertion into AST.

You can nest #StaticTypecheck and #GradualTyping annotations in order to enable or disable cast insertion while typechecking. But there is no point to nest type related information into #NoTypecheck annotation, because they would be completely ignored by typechecker. Having all power of gradual types and unchained diversity of undefined behaviour, let's user interpretation mode of Lama compiler to see the slowdown in the code execution. We will use sample code:

fun fibonacci(k) {

if k == 0 then return 0 elif k == 1 then return 1 elif k < 0 then return -1 else return fibonacci(k-1)

+ fibonacci(k-2) fi

}

write(fibonacci(read())) It is not obvious where are the casts in this example, but in section 2 we have noticed that + operator coerces both its arguments to Const at runtime, so appropriate casts to TConst types from unknown type are inserted. Hence, this code is modeling situation of frequent value passage from untyped part of code to typed part of code.

Крыштапович В.С. Локализованное применение частичной типизации. Труды ИСПРАН, том 33, вып. 3, 2021 г., стр. 61-76.

We will compare this code wrapped in #GradualTyping which is the default, and #NoTypecheck annotations. The time measurement is performed with Unix time utility, thus compile time included in both measures.

Typed |

+ -

| n | +---- +

10 11 12

13

14

15

16

17

18

19

20 21 22

23

24

25

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

+-

--- +

Untyped |

0,119s 0,097s 0,088s 0,094s 0,091s 0,086s 0,092s 0,095s 0,093s 0,102s 0,106s 0,124s 0,132s 0,162s 0,208s 0,284s 0,416s 0,593s 0,909s 1,467s 2,326s 3,659s 5,977s 9,477s 0m15,981s 0m26,933s 0m42,236s 1m12,161s 1m53,534s 3m18,046s

5m17,664s

----+

0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m

0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m 0m

The average of slowdown sdn =

0,092s 0,079s 0,087s 0,093s 0,095s 0,090s 0,094s 0,088s 0,100s 0,105s 0,125s 0,124s 0,154s 0,192s 0,279s 0,389s 0,581s 0,878s 1,363s 2,179s 3,561s 5,796s 9,469s 0m14,108s 0m24,799s 0m43,855s 1m 7,766s 1m49,319s 3m 0,748s 4m54,461s 7m54,811s

Typed

^Unyped

from the point of actual slowdown registered n = 21 is:

_ v40

7n ¿rn=21

Typed

40 _

n=21 Unyped

1.45.

As we can see, section of code with active gradual typing runtime type checking exhibit almost x 1. 5 slowdown. Thus, we have reproduced the result of an article [2] but in the case Lama semantics using this artificially small example.

6. Conclusion

We introduced type system with following properties:

• Monomorphic;

• Gradual.

It would be nice to introduce such features in type system as:

• Polymorphism;

n

n

• Recursive types.

In the future work it is desired to use type equations and Hindley-Milner style inference with unification algorithm as presented in [17] and [19].

It is worth to mention the reproduction of the result of a recent article about industrial-level languages that use gradual types unsoundly [2]. We have modeled the situation of values constantly transiting from untyped part to typed parts of program and expectedly acquired slowdown of execution.

In addition, we have provided a simple and powerful, yet dangerous, method of maintaining tradeoff between type safety and execution performance: let programmer choose areas of code where he needs extra performance and where he needs static and runtime type safety guaranties, either with #NoTypecheck, or better with #StaticTypecheck and #GradualTyping annotations. The idea goes further. It would be nice to introduce some other sections of static verification that programmers can apply at their taste. For instance, live variable analysis #LiveVarAnalysis, or memory access safety. Thus, programmer acquire framework with bunch of static verifiers and the ability to choose what guaranties is the most important at applied piece of code. To sum up, programmer maintains compilation time and acquires code with the needed guarantees unified in one syntax.

Even though the type system soundness is still questionable and should be proved or improved, several tests are added to codebase to check type system, including not compiling tests, runtime error tests and positive example tests. Introduced type system enhances coding experience and points out at least silly and obvious errors that programmers are frequently making. Moreover, Lama's facility has been extended by logger to generate warning messages, mostly for case expression coverage. The implementation of gradual typing for Lama language resides in personal repository within branch named "GraduLama" [20].

References

[1] Jeremy G. Siek and Walid Taha. Gradual Typing for Functional Languages. In Proc. of the Seventh Workshop on Scheme and Functional Programming, 2006, pp. 81-92.

[2] Cameron Moy, Phuc C. Nguyen et al. Corpse reviver: sound and efficient gradual typing via contract

verification. Proceedings of the ACM on Programming Languages, vil. 5, issue POPL, 2021, Article 53, 28 p.

[3] D. Boulytchev. JetBrains-Research/Lama source code. Available at https://github.com/JetBrains-Research/Lama, accessed 27/03/2021.

[4] Ingkarat Rak-amnouykit, Daniel McCrevan et al. Python 3 types in the wild: a tale of two type systems. In Proc of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020), 2020, pp. 57-70.

[5] Guido van Rossum, Ivan Levkivskyi. PEP 483 - The Theory of Type Hints. Available at https://www.python.org/dev/peps/pep-0483/ Request timestamp: 27/03/2021.

[6] Guido van Rossum, Jukka Lehtosalo, Lukasz Langa. "PEP 484 - Type Hints. Available at https://www.python.org/dev/peps/pep-0484/, , accessed 27/03/2021.

[7] Jukka Lehtosalo et al. Mypy: Optional Static Typing for Python. Available at https://github.com/python/mypy, accessed 27/03/2021.

[8] Pytype: A static type analyzer for Python code. Available at https://github.com/google/pytype, accessed 27/03/2021.

[9] Sam Tobin-Hochstadt, Vincent St-Amour et al. The Typed Racket Guide. Available at https://docs.racket-lang.org/tsguide/index.html, accessed 27/03/2021.

[10] D. Boulytchev. Lama language specification v. 1.10. Available at https://github.com/JetBrains-Research/Lama/blob/1.10/lama-spec.pdf, accessed 27/03/2021.

[11] R. Rivest. S-Expressions., 4/05/1997. Available at http://people.csail.mit.edu/rivest/Sexp.txt, accessed 29/03/2021.

[12] Amal Ahmed, Dustin Jamneret al. Theorems for free for free: parametricity, with and without types. Proceedings of the ACM on Programming Languages, vol. 1, issue ICFP, 2017, Article 39, 28 p.

[13] Jack Williams, J. Garrett Morris, and Philip Wadler. The root cause of blame: contracts for intersection and union types. Proceedings of the ACM on Programming Languages, vol. 2, issue OOPSLA, 2018, Article 134, 29 pages.

[14] P. Wadler. A Complement to Blame. In Proc. of the 1 st Summit on Advances in Programming Languages (SNAPL 2015), 2015, pp. 309-320.

[15] Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, vol. 74, no. 2. 1953, pp. 358- 366.

[16] Benjamin C. Pierce. Types and Programming Languages. The MIT Press, 2002, 648 p.

[17] Giuseppe Castagna, Victor Lanvin et al. 2019. Gradual typing: a new perspective. Proceedings of the ACM on Programming Languages, vol. 3, issue POPL, 2019, Article 16, 32 p.

[18] Karla Ramirez Pulido, Jorge Luis Ortega-Arjona et al. Gradual Typing Using Union Typing with Records. Electronic Notes in Theoretical Computer Science, vol.354, 2020, pp. 171-186.

[19] Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic type inference for gradual Hindley-Milner typing. Proceedings of the ACM on Programming Languages, vol. 3, issue POPL, 2019, Article 18, 29 pp.

[20] V. Kryshtapovich. GraduLama source code Available at https://github.com/kry127/Lama/tree/gradulama, accessed 27/03/2021.

Информация об авторах / Information about authors

Виктор Сергеевич КРЫШТАПОВИЧ, студент магистратуры второго курса. Научные

интересы: системы типов, базы данных.

Viktor Sergeevich KRYSHTAPOVICH, second year master's student. Research interests: type

systems, databases.

i Надоели баннеры? Вы всегда можете отключить рекламу.