Getting started with Kotlin for Data Science and AI

Subscribe to my newsletter and never miss my upcoming articles

source: kotlinlang.org

Data science is a quite exciting field, you are trying to figure out some patterns in your data, make some cool plots from your data and also gather some insights from it. If you do not know much about data science simply understand it as a field covering data engineering, data analysis, machine learning, visualization, and many more. Just take an example, I have a data which is huge for now let’s say something around 50 GB it would take me almost 5 minutes to load this kind of data. What if I made some error in my code and have to execute it again. I wait for 5 minutes again, so it makes sense to use ‘notebooks’ here. I can execute a code cell, see the output and then run another code cell. To, do this in Python or R we use Jupyter Notebooks. We will further see the tools Kotlin is equipped with to make Data Science a lot easier.

Tools

Here instead of an integrated development environments (IDE) that can handle all the development tasks in a single tool we would use similar tools called notebooks as discussed earlier. Notebooks let users conduct research and store it in a single environment. In a notebook, you can write narrative text next to a code block, execute the code block, and see the results in any format that you need: output text, tables, data visualization, and so on. Kotlin provides very easy integration with two popular notebooks: Jupyter and Apache Zeppelin, which both allow you to write and run Kotlin code blocks making the process a lot easier. In this blog we would not be covering how you could use Apache Zeppelin but you can check that out for yourself here.

Setting up the Jupyter Kernel

Let us now see how you would set up a Jupyter kernel for development with Kotlin. For code execution, Jupyter uses the concept of kernels — components that run separately and execute the code upon request, for example, when you click Run in a notebook.

There is a kernel that the Jupyter team maintains themselves — IPython for running the Python code. However, there are other community-maintained kernels for different languages. Among them is the Kotlin kernel for Jupyter notebooks. With this kernel, you can write and run Kotlin code in Jupyter notebooks and use third-party data science frameworks written in Java and Kotlin.

Before setting up the kernel, I expect you to have Java 8 and Conda and/or pip installed on your system.

  • Conda
conda install -c jetbrains kotlin-jupyter-kernel

Run the above command to install the latest stable release through conda

  • Pip
pip install kotlin-jupyter-kernel
  • Source

You could also install it directly from its GitHub repo too

git clone https://github.com/Kotlin/kotlin-jupyter.git
cd kotlin-jupyter
./gradlew install

Once the kernel is installed you can use any of these command

  • jupyter console --kernel=kotlin

  • jupyter notebook

  • jupyter lab

Once in, to start using kotlin kernel inside Jupyter Notebook or JupyterLab create a new notebook with kotlin kernel.

Libraries

Fortunately, there are already plenty of frameworks written in Kotlin for data science. There are even more frameworks written in Java, which is perfect as they can be called from Kotlin code seamlessly.

Below are two short lists of libraries that you may find useful for data science.

Kotlin Libraries

  • kotlin-statistics is a library providing extension functions for exploratory and production statistics. It supports basic numeric list/sequence/array functions (from sum to skewness), slicing operators (such as countBy, simpleRegressionBy), binning operations, discrete PDF sampling, naive bayes classifier, clustering, linear regression, and much more.

  • kmath is a library inspired by NumPy. This is by far my favorite library and the most useful according to me. This library supports algebraic structures and operations, array-like structures, math expressions, histograms, streaming operations, a wrapper around commons-math and koma, and more.

  • krangl is a library inspired by Python’s pandas. This library provides functionality for data manipulation using a functional-style API; it also includes functions for filtering, transforming, aggregating, and reshaping tabular data. This creates an entry point for you to start with data science and AI allowing you to easily load tabular data.

  • lets-plot is a plotting library for statistical data written in Kotlin. Lets-Plot is multi platform and can be used not only with JVM, but also with JS and Python too. Understand it as cross platform Matplotlib.

Java Libraries

Kotlin provides very easy and seamless interoperability with Java which means you could use all of Java’s libraries too! There are a lot of Java libraries which can aid you to plot graphs, due to mathematical calculations, apply Deep Learning and lot more, you can find them all here.

Lets-Plot for Kotlin

It would be best now to see some ways to use some indispensable libraries namely Lets-Plot and Numpy. Let’s get started with Lets-Plot.

Lets-Plot for Kotlin is a Kotlin API for the Lets-Plot library — an open-source plotting library for statistical data written entirely in Kotlin. Lets-Plot was built on the concept of layered graphics implemented in the ggplot2 package for R. Lets-Plot for Kotlin is tightly integrated with the Kotlin kernel for Jupyter notebooks. Once you have the Kotlin kernel installed and enabled, add the following line to a Jupyter notebook:

%use lets-plot

And that’s it you can now use all of lets-plot functions and create beautiful graphs.

Numpy

**KNumpy (Kotlin Bindings for NumPy**) is a Kotlin library that enables calling NumPy functions from the Kotlin code. NumPy is a popular package for scientific computing with Python. It provides powerful capabilities for multi-dimensional array processing, linear algebra, Fourier transform, random numbers, and other mathematical tasks.

KNumpy provides statically typed wrappers for NumPy functions. Thanks to the functional capabilities of Kotlin, the API of KNumpy is very similar to the one for NumPy. This lets developers that are experienced with NumPy easily switch to KNumpy. Here are two equal code samples in Python and Kotlin.

# Python

import numpy as np
a = np.arange(15).reshape(3, 5)

print(a.shape == (3, 5))        # True
print(a.ndim == 2)              # True
print(a.dtype.name)             # 'int64'

b = (np.arange(15) ** 2).reshape(3, 5)
// Kotlin 

import org.jetbrains.numkt.*

fun main() {
    val a = arange(15).reshape(3, 5)

    println(a.shape.contentEquals(intArrayOf(3, 5))) // true
    println(a.ndim == 2)                             // true
    println(a.dtype)                                 // class java.lang.Integer

    // create an array of ints, we square each element and the shape to (3, 5) 
    val b = (arange(15) `**` 2).reshape(3, 5)
}

An important thing to not is that unlike Python, Kotlin is a statically typed language. This lets you avoid entire classes of runtime errors with KNumpy, the Kotlin compiler detects them at earlier stages.

Concluding

And that was just the very basics of what Kotlin could do in the field of Data Science and AI, we have still not seen building neural networks in Kotlin and maybe cover that in some other post. But you just saw the power of Kotlin and how it extends to Data Science too, you also saw the vast array of tools and libraries to make your work a lot easier .

About Me

Hi everyone I am Rishit Dagli

LinkedIn — linkedin.com/in/rishit-dagli-440113165/

Website — rishit.tech

If you want to ask me some questions, report any mistake, suggest improvements, give feedback you are free to do so via the chat box on the website or by mailing me at —

No Comments Yet