NML Says

Open Source Development 11 - Wrapping Up

Model Solutions

The Code Repository

The code itself is to found in a git repository at https://codeberg.org/arosano/librarya.git.

Example 1. README.md
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Library a

Copyright (c) 2025 Niels Müller Larsen

## Content 
Functions and testsuite for them in relation to 
classical encryption ciphers and language
analysis. So far only unigrams, bigrams, and 
trigrams has been implemented.

The project has been used as an exercise
in a BSc(Hons) program in Informatics.
This is a model solution.

The `data` directory contains sample texts.
A short text for verification, and some longer
texts from the Gutenberg project. They
have been converted by `iconv` from ISO-8859-1
to UTF-8.

The `ngramming.py` makes uni-, bi-, or trigrams
reading from standard input. The results are
written to standard output.

The `ngramming.sh` is a shell script that calls
`ngramming.py` given relevant parameters.

Calling sequence:

`./ngramming.sh 1 poe.txt`

This will create a unigram frequency file
from the `poe.txt` file. Input files MUST
be in the `data` directory, whre output
will also be placed.

## Licensed under the BSD-3 license. Please
refer to the LICENSE document here.
Example 2. LICENSE
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
BSD-3 License

Copyright © 2025 Niels Müller Larsen

Redistribution and use in source and binary forms, with
or without modification, are permitted provided that the
following conditions are met:

1.  Redistributions of source code must retain the above 
    copyright notice, this list of conditions and the 
    following disclaimer.

2.  Redistributions in binary form must reproduce the 
    above copyright notice, this list of conditions and
    the following disclaimer in the documentation and/or
    other materials provided with the distribution.

3.  Neither the name of the copyright holder nor the names
    of its contributors may be used to endorse or promote
    products derived from this software without specific 
    prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 
“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 
POSSIBILITY OF SUCH DAMAGE.
Example 3. Compilation of Our Work - testsuite.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
'''
    testsuite.py
    
    Copyright (c) 2025 Niels Müller Larsen
    Licensed under the BSD-3 License,
    please refer to the LICENSE document
'''

import unittest
from liba import *

class Testing(unittest.TestCase):
    '''
    OSD.9.1
    Copyright (c) 2025 Niels Müller Larsen
    Licensed under the BSD-3 License,
    please refer to the LICENSE document
    '''
    def test_vigenE0(self):
        self.assertEqual(vigenE('DUH', 'THEY DRINK THE TEA'), 'WBLBXYLHRWBLWYH')

    '''
    OSD.9.2
    Copyright (c) 2025 Niels Müller Larsen
    Licensed under the BSD-3 License,
    please refer to the LICENSE document
    '''
    def test_vigenD0(self):
        self.assertEqual(vigenD('DUH', 'WBLBXYLHRWBLWYH'), 'THEYDRINKTHETEA')


    '''
    OSD.9.5
    Copyright (c) 2025 Niels Müller Larsen
    Licensed under the BSD-3 License,
    please refer to the LICENSE document
    '''
    def test_n_gram0(self):
        s = 'The quick brown fox jumps over the lazy dog'
        n = 3
        dn = {
            'AZY': 1,
            'BRO': 1,
            'CKB': 1,
            'DOG': 1,
            'ELA': 1,
            'EQU': 1,
            'ERT': 1,
            'FOX': 1,
            'HEL': 1,
            'HEQ': 1,
            'ICK': 1,
            'JUM': 1,
            'KBR': 1,
            'LAZ': 1,
            'MPS': 1,
            'NFO': 1,
            'OVE': 1,
            'OWN': 1,
            'OXJ': 1,
            'PSO': 1,
            'QUI': 1,
            'ROW': 1,
            'RTH': 1,
            'SOV': 1,
            'THE': 2,
            'UIC': 1,
            'UMP': 1,
            'VER': 1,
            'VER': 1,
            'WNF': 1,
            'XJU': 1,
            'YDO': 1,
            'ZYD': 1
        }
        self.assertEqual(n_gram(s, n), dn)

    def test_format_n_gram0(self):
        s = 'haha'
        n = 1
        so = "'A':\t50.00\n'H':\t50.00\n"
        self.assertEqual(format_n_gram(n_gram(s, n)), so)

if __name__ == '__main__':
    unittest.main()
Example 4. Compilation of Our Work - liba.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
'''
    Copyright © 2025 Niels Müller Larsen
    Licensed under the BSD-3 License,
    please refer to the LICENSE document
'''

import re

def cryptoPrep(s):
    s = s.upper()
    s = re.sub(r'[^\w]', '', s)
    return s

alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def caesarD(s, key):
    '''
        OSD.8.15
        Decrypts a string encrypted with Caesars Cipher
    '''
    return caesarE(s, -key)

def caesarE(s, key):
    '''
        OSD.8.14 updated
        Encrypts a string with Caesars Cipher
    '''
    s = cryptoPrep(s)
    ci = ''
    key = key % len(alphabet)
    if key == 0:
        return s
    for ch in s:
        ci += alpha[(alphabet.index(ch) + key) % len(alphabet)]
    return ci

def vig(key, s, enc=True):
    '''
        Encrypts/decrypts a string with the Vigenére Cipher
        s is a string
        key is an alphanumeric key (string)
        returns a string
    '''
    s = cryptoPrep(s)
    out = ''

    ki = 0
    for char in s:
        if char not in alphabet: 
            out += char 
            continue
        keychar = key[ki % len(key)]
        ki += 1
        j = alphabet.index(char)
        if enc:
            j += alphabet.index(keychar)
        else:
            j -= alphabet.index(keychar)
        j %= len(alphabet)
        out += alphabet[j]
    return out

def vigenD(key, s):
    '''
        OSD.9.2
        Decrypts a string encrypted with the Vigenére Cipher
    '''
    return vig(key, s, False)

def vigenE(key, s):
    '''
        OSD.9.1
        Encrypts a string with the Vigenére Cipher
    '''
    return vig(key, s, True)

def n_gram(s, n):
    '''
        OSD.9.5
        Creates a dictionary with n_gram frequencies
        given a string input
    '''
    s = cryptoPrep(s)
    dic = {}
    for i in range(len(s)-(n-1)):
        item = s[i:i+n]

        if item in dic:
            dic[item] += 1
        else:
            dic[item] = 1
    return dic

def format_n_gram(dic):
    '''
        OSD.A.0
        Formats a dictionary of n_gram frequencies
        for prettyprinting.
        n_gram frequencies as if formatted 
        by n_gram(s, n)
    '''
    # sort input
    dic = sort_dic_key(dic)

    # iterate over values to find total
    count = 0
    for keys in dic:
        count += dic[keys]

    # iterate over keys and values to create frequency table
    s = ''
    for keys in dic:
        s += f"'{keys}':\t{100*dic[keys]/count:5.2f}\n"
    return s

def sort_dic_key(dic):
    '''
        OSD.8.3
        sorts a dictionary for prettyprinting
    '''
    return dict(sorted(dic.items(), key=lambda item: item[0]))

The Solution Itself

The Python code that solves the exercise follows here.

Example 5. librarya/ngramming.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
'''
    ngramming.py
    
    Copyright (c) 2025 Niels Müller Larsen
    Licensed under the BSD-3 License,
    please refer to the LICENSE document
'''
import sys
from liba import *

def doSomething(n):
    s = sys.stdin.read()
    dic = n_gram(s, n)
    so = format_n_gram(dic)
    sys.stdout.write(so)

if __name__ == '__main__':
    argv = sys.argv[1]
    degree = int(argv)
    doSomething(degree)

The solution being in an Alpine Linux container is in the following shell script.

Example 6. librarya/ngramming.sh
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env sh
DATA=data
if [ $# -ne 2 ]; then
    echo "Usage: $0 n file" >&2
    echo "where n = 1, 2, or 3, and file is inputfile in the 'data' directory"
    exit 1
fi

INPUT=$2
DEGREE=$1

SUFFIX=unigram
if [ $DEGREE -eq 2 ]; then
    SUFFIX=bigram
elif [ $DEGREE -eq 3 ]; then
    SUFFIX=trigram
fi

cat ${DATA}/${INPUT} | python3 ngramming.py ${DEGREE} > ${DATA}/${INPUT}_${SUFFIX}.txt
Example 7. Execution Example
1
./ngramming.sh 1 sample.txt

The Docker Part

The following Dockerfile to be executed re the README.md from the project directory, ie the parent directory of the librarya repository documented above.

Example 8. The Generating Dockerfile
1
2
3
4
5
6
7
FROM alpine
RUN apk update
RUN apk add git
RUN apk add python3
RUN apk add nano
WORKDIR /home/inthecrypt
COPY librarya .

The Docker image is an Alpine Linux with git, python, and nano installed. The directive in line 7 will cause the librarya project to be included in the Docker image. It should be possible to issue a git pull from the librarya directory, to get an update from the repository server Codeberg. A git remote -v will give you the URL of the repository if you need to check it.

The Docker image may be built from the CLI with code like the following. Alternatively it may be pulled from https://hub.docker.com on your CLI by

docker pull arosano0/classiccrypt

and used as shown in example 10 below.

Example 9. Generating Commands Locally
1
2
3
4
docker build -t arosano0/classiccrypt .
docker run -it --name classiccryptc \
    -v /home/nml/testdata:/home/inthecrypt/data \
    classiccrypt

where classiccrypt is the name for the image, and classiccryptc for the container. In testing you may restart the container like this, again from the CLI:

Example 10. Restart Sequence
1
2
3
4
5
6
docker stop classiccryptc
docker rm classiccryptc

docker run -it --name classiccryptc \
    -v /home/nml/testdata:/home/inthecrypt/data \
    classiccrypt

Wrapping Up

Q & A

What did we do right? What could we have done better? Focus on code?

Thank You and Goodbye

“Alvaida!” “Dhanyabad!”