code your own language
-> code your own language
=> dictionary-of-programming-concepts.html dictionary-of-programming-concepts
=> build-your-own-freedom-lab.html build-your-own-freedom-lab
=> quasi.html quasi
=> which-language-should-i-use.html which-language-should-i-use
intro
so, youre interested in programming, but you want your own language to play with?
you have choices-- you can take a university course, there are some video lectures you can watch, you could get a giant book or start learning how to write context-free grammars in bnf or ebnf... those are all choices you have.
or you could keep it simple and have some fun-- then when you have your easy-to-design language, you can dress it up as much or as little as you want.
that isnt to say you should avoid planning-- its definitely good to write down some ideas before you start. if you dont have any ideas about what you want, you can just try stuff as you go along.
thats no way to create a great language, but many of the most popular languages in use didnt start out as great languages-- they started out simple.
if you want to create a great language, you should probably at least practice first. thats what this is really about. so this website will hopefully get you started.
why write a language?
people will tell you lots of reasons /not/ to write a language-- theyll say you dont know how, which is actually a great reason to try it, theyll say its too difficult, but this will show you that it isnt. they will also say there are too many languages already-- but people create new languages all the time. some people even create programming languages accidentally.
good reasons to write a language are:
* for fun
* for learning
* for practice
* to create a tool that is useful to you
* for teaching
a quick start
if youre doing this for fun or exploration, you can start immediately. you simply need to write a program that loads a text file and starts looking for certain words. that can be as simple as loading the file as a string, adding each individual character of the string to a second string, then examining the rightmost n characters to find out if they match a keyword.
thats so simple, even the script that displays this page does it.
if you want an even simpler way to start, just put one word on each line of a text file. read the file, and check if each line matches a keyword. if it does, run some code (you can do this with multiple 'if' statements.)
the key to this process is iteration-- start very simple, then when you want to make it more complicated or sophisticated, add to it and change it. now you already have enough to get started-- even if youre not very familiar with coding, you can learn how to open a file, read the lines, and do if statements.
the first program people usually learn is "hello world." you can start by creating a programming language with a single command called "helloworld" or "hw" or "hello", which does nothing other than say "hello, world" whenever the command is found. if its good enough for a first program-- its good enough for your first programming language.
if you want more ideas, keep reading.
what is a programming language?
take every programming language, and every fact about those languages. if you strike out each thing that doesnt exist or is not true across every language, you will have a list of **known requirements** for a programming language.
here are some things you can strike out:
* the language is entirely new, original or unique
* it talks directly to the cpu / hardware
* it is commercially viable
* if offers an abundance of higher math functions
* it is all purpose
* it is interpreted
* it is compiled
* it is fast
* it has more than 100 commands
none of these are requirements, they are assumptions. and while youre free to create a language that conforms to some or all of these assumptions, it is not necessary to do so to create a programming language.
historically, when you take a **collection of routines** and **give them names**, you are creating a programming language. today we can distinguish between libraries, which are considered part of an existing language (the pygame library) and a programming language itself. once you start to create unique syntax and a new environment for those libraries to be used in, you are creating a new language-- however simple or unsophisticated it is.
more things to consider in advance
if you want to enjoy yourself, you might want to consider starting with a language that you enjoy using or enjoy learning. if you use an easy language, it will be easier for others to help with or adapt your language.
there are 2 or 3 languages you will be dealing with while creating your language:
1. the language you design
2. the language you use to implement it
3. the language you translate to
if your language is interpreted, then 2 and 3 will be the same. if your language is compiled, then 2 and 3 may or may not be different. if you eventually create a language that is self-hosting, then 1 and 2 could become the same.
an interpreted language will run as it translates. to run an interpreted language, requires the interpreter to be present (installed or available) every time the program runs.
whether your language is interpreted or compiled depends on 3-- the language you translate to, more than it depends on the language you use for implementation. for example, python is interpreted, but you can use it to implement a compiler.
ultimately, the main feature of a compiler is that it translates the entire program before the output (compiled) program runs. python, though interpreted, can translate an entire program to another language.
if this sounds too complicated to start out with, dont worry. youll figure some of it out along the way.
when an interpreted language is easier to implement
when you first start to implement a language, you will probably find that its easier to begin by interpreting. our initial "get started" advice points more to an interpreter than a compiler. this is a great way to get a quick, very simplified idea of how languages are implemented.
when a compiled language is easier to implement
you may find out fairly quickly, that while interpreted languages are less trouble at the outside, that a compiled (fully translated) language is easier to implement once it has have "complex" features like loops, conditionals and functions. other basic features, such as variables and input and output, will possibly be trivial to implement whether interpreted or compiled.
iteration
no matter how you plan ahead, your first language isnt going to be everything you want it to be. youre going to find limitations along the way, that only an experienced designer is likely to avoid. if your goal is to make a sophisticated language, youre going to need experience first. so enjoy making your first language and dont set the goals ridiculously high.
if you fall in love with creating languages, you may decide to create others. thats when you start really planning and taking it seriously. you might create several simple languages before one convinces you to stick with it. even then, you might think of other things you want to do with a different language project.
features
language is the essence of modern computing. all computers do is move numeric information from one place to another. you put a number one place, you put another nearby, you tell the computer to add them. to draw a line on the screen involves copying a lot of numbers to a certain area-- the numbers are figured out by simpler calculations in one or more loops.
you dont have to know how every function works to use it-- no single person invented all the routines that we use every day. even before the compiler was invented, codes that were routinely used for common tasks were kept in a dictionary-- an automatic dictionary (a library) became the basis of the first compiler. it almost certainly contained routines written by several operators. the use of modern programming libraries is similar.
if you choose an implementation language that already has many of the features you want, you will be able to implement many features simply by calling them from another language. some people will tell you this is cheating, but only if they havent thought it through. the authors of python never reimplemented pygame, it is a library implemented around sdl-- which is implemented in yet another language. so if python doesnt "cheat" by calling features from the pygame library, why would you be cheating if you called pygame features from your own language?
still, as much as your language differs from the implementation language, it will be up to you to code those differences.
commands first
grace hopper, the pioneer of compiled languages, taught university mathematics and still believed that most users would prefer words to "symbols" for writing code. the truth is somewhere in the middle, with most popular languages containing some of both. still, a language with too much fiddly syntax is at least more to learn and more to explain when teaching-- especially when doing so for the first time.
some of todays popular languages avoid syntax as much as is reasonable-- they even move the "problem" of punctuation from the keyboard to colourful boxes, so instead of typing things like { (, , ,) ; } you just fill out a tiny form for each command. this saves typing and tedious errors over things that are unimportant.
of course punctuation in programming languages isnt all bad; it can help separate commands and improve the readability of a program. if you are accustomed and comfortable with it, punctuation in a language can save typing and help you focus on more important things. still, the simpler the syntax, the easier it is to parse the program-- especially at the outset.
it is recommended, against the common advice of many guides like this one, that you start with commands, rather than parsing first. think about what you want your language to do-- what youre going to use it for. think about your implementation language-- hopefully it is either one you are comfortable using, or enjoy learning more about.
if you like doing graphics, you will probably want your language to have a few graphics commands. if you add a command to interact with the shell or run other programs, you can use your language to automate computer tasks. if you are going to process data, you probably want commands to open files and split text into arrays of some kind. once again, if these features are available in the language you use to implement your own, they will probably be trivial to add. but they dont always need to be available. you can implement a split() command, even if there is not one present.
try writing the code for a few commands first-- start with a command that says "hello, world" and call it hw or hello. now add a few other commands. write a quick and simple parser that looks for the name of the command, and runs it. voila-- you have a computer language.
do you know what your parser is going to be like? stop-- you can write multiple parsers, later. create some of the other parts of your language, first. many languages that have started with a parser have gone back and made multiple changes or versions of it anyway. there are several ways to integrate a parser with your program, and not much advantage to starting out with that aspect of your language.
start by creating some functions-- maybe what you really want to do is create a library. no? you want the language to differ from the implementation language? ok, better keep making your language then.
choosing an implementation language
you can create a language in c, c++, javascript, python, bash-- languages that have conditional statements and commands for processing strings generally work fine for creating a language.
of these, bash is probably the most tedious to create a language in. others, like python, may have additional tools just for this purpose. you dont necessarily need those additional tools, but they exist.
if the goal is to compile programs written in your language to standalone utilities, you may want to use c or c++. people who run your compiler will need all the required libraries installed, they will need to worry about things like 32bit or 64bit, intel or arm, etc.
if you want to keep this simple, a ubiquitous interpreted language like javascript or python will make things easier. you can "compile" (fully translate) your own language to javascript or python-- then youll need a browser or python interpreter to run it.
using python will make it easier to have loops, work with local files on the computer, and automate tasks. using javascript will make it easier to create gui programs that run in the browser, interface with websites (sometimes) and rich text-- things that are more tedious in python.
these are just examples. of course this guide recommends python for creating your language, but its important to know there are a variety of languages you can use, and that you can choose one that you prefer.
language tasks (basically categories of features)
assuming that this is a simple language, you might not provide ways to explictly create object classes. a lot of people who are first learning python, start with things like function definitions (theyre technically objects) or just if statements. you might prefer to start with:
* output statements (text, graphics)
* variable assignment
* input statements (keyboard or mouse)
* simple math functions (adding, subtracting, division)
output statements work fine with constants, while input statements practically require a variable to copy input to. math functions are only useful without a variable if theyre tied to an output statement.
if you have implemented output statements, variable assignment, input and math functions as an interpreter, you might want to begin converting your program to output code rather than running it before going further.
instead of saying:
if token == "hw": print "Hello, world"
you probably want this instead, to compile:
quot = chr(34)
if token = "hw": print "print " + quot + "Hello, world" + quot
or if you have a file open for output:
quot = chr(34)
dosnl = chr(10) + chr(13)
unxnl = chr(10)
if token = "hw": outputfile.write("print " + quot + "Hello, world" + quot + unxnl)
interpreting the code runs it as its translated, compiling the code simply outputs the code to a new program.
it is possible, but much trickier, to implement an interpreter for features such as *loops*, *conditionals* and *function definition*. when you want to compile to a language that already has these features, you simply output the syntax for such features. when you want a for loop, you simply output the syntax for a for loop in your implementation language.
getting around to parsing
the more features you plan out ahead of time, (implementing them first is optional) the more ready you will be to add a parser.
there are several ways to add a parser to your language:
* you can modify one from an existing language implementation
* you can download one and tailor it to your needs
* you can generate one with bnf or ebnf (not the friendliest option, but can produce high-quality parsers)
* you can use parser features or libraries from a robust, extensive language like python
* you can visit youtube for a tutorial on writing a simple parser
* you can start small, then elaborate on your own
these are all possibilities, though the easiest (unless youre already familiar with other tools related to parsing) is probably modifying an existing one or starting small.
for starting small, try this:
1. first write a program that loads a text file with one command per line, like this:
hw
hw
hw
each time it says hw, make it say "Hello, world" and each time it says anything else, either ignore it or give an error: "command not found."
2. now write a program that loads a text file but treats separate words individually. you can do this with the split command in python, but its probably better to read each line (or the whole file) one character at a time:
hw hw hw
hw hw
again, each time it says hw, make it say "Hello, world" and each time it says anything else, either ignore it or give an error: "command not found."
3. now write a program that does the same as before, but keeps reading until it encounters a semicolon ; or newline:
hw ; hw ; hw
hw ; hw
this is one way of dealing with commands that have "parameters" or options, like print:
hw ; print hi ; hw
print there ; hw
4. bonus points: modify your program (keep a backup of the working version, or use a gitlab or notabug repo) to keep reading when a space inside quotes is found:
hw ; print "hi there" ; hw
print "hello" ; hw
what about guis and 3d and ai and other "advanced" features?
whether youre talking about windows, or bsd or macos or gnu/linux, these are features that are likely to be made either as a 3rd-party library as is the case with sdl or opengl, or as a call to a system library as with some calls to the windows or macos libraries, or as a call to native or standard libraries as in python.
whether it is a "thick binding" that abstracts the syntax of the call, or a "thin binding" that is closer to the internal structure of the library, it is ultimately a function call (or method call, if you use objects.)
so when you talk about adding these features, in many instances youre talking about calling the functions or methods. if you dont want to write your own 3d engine or your own gui, you could just do like many people and include a library for it thats already been written.
happy coding!