Building a Real-Time Note-Taking App

4 min readMar 27, 2024

I’m currently a student and a lot of the time in class, I miss some of the things my teacher says even if I’m taking notes. So I built a website that can transcribe your classes or meetings and then give you a summary of the class/meeting.

This is a step-by-step tutorial where you will learn how to create a real-time note-taking app that uses the power of the Web Speech API for speech-to-text transcription and the Cohere platform for text summarization.

What We Will Cover

Setting up a Flask server with Socket.IO for real-time communication
Creating a simple UI for audio recording and display
Integrating the Web Speech API for converting speech to text in real-time
Using Cohere’s API to summarize

Prerequisites

Before we begin, make sure you have the following:

Python and all required libraries installed

pip install flask flask-socketio cohere

Basic understanding of Python, JavaScript, and web development
An API key from Cohere(make an account and you get one for free)

Step 1: Setting Up the Flask Server

Your server is the backbone of your application. It serves web content, handles real-time events, and interacts with the Cohere API for summarization.

from flask import Flask, render_template
from flask_socketio import SocketIO, emit
import cohere

app = Flask(__name__)
socketio = SocketIO(app, cors_allowed_origins="*")
cohere_client = cohere.Client('cohere-api-key')
@app.route('/')
def index():
    return render_template('index.html')
@socketio.on('summarize')
def handle_summarization(data):
    text = data['text']
    try:
        response = cohere_client.generate(
            model='command', 
            prompt=f"Summarize this text: {text}", 
            max_tokens=50, 
            temperature=0.5
        )
        summary = response.generations[0].text
        emit('summarization_result', {'summary': summary})
    except Exception as e:
        emit('summarization_error', {'error': str(e)})
# Run the Flask application
if __name__ == '__main__':
    socketio.run(app, debug=True)

Flask is used to create our web server.
SocketIO allows real-time bi-directional communication between web clients and the server.
The index() function maps to the root URL and renders our HTML.
The handle_summarization() function listens for a 'summarize' event in JavaScript, processes the data by using the Cohere API, and sends back the result.

Step 2: Creating the Frontend with HTML and JavaScript

Now let’s focus on the index.html file. This is where we create a user interface for recording and summarizing speech, and use JavaScript to interact with the Web Speech API and our server.

UI Elements

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Note Taking App</title>
    <style>
/* UI Style Components! */
        body {
            font-family: 'Arial', sans-serif;
            text-align: center;
            background-color: #F0F8FF;
            margin: 0;
            padding: 20px;
        }
        h1 {
            color: #FF69B4; 
        }
        button {
            padding: 10px 20px;
            font-size: 16px;
            background-color: #FFB6C1; 
            color: #fff;
            border: none;
            border-radius: 5px;
            margin: 10px;
            cursor: pointer;
            transition: background-color 0.3s ease;
        }
        button:hover {
            background-color: #FF69B4; /* Hot pink */
        }
        #transcription, #summary {
            margin: 20px auto;
            padding: 10px;
            background-color: #FFF;
            border-radius: 5px;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
            width: 80%;
            min-height: 100px;
            overflow-wrap: break-word;
            border: 2px dashed #FFB6C1; 
            color: #333;
        }
    </style>
/* Installing Socket.io */
    <script src="//cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.1/socket.io.js"></script>
</head>
<body>
/* Adding the buttons*/
    <h1>Note Taking App 🎙️</h1>
    <button onclick="startRecording()">Start Recording 🎤</button>
    <button onclick="stopRecording()">Stop Recording 🛑</button>
    <button onclick="summarizeText()">Summarize ✨</button>
    <div id="transcription">Transcription will appear here...</div>
    <div id="summary">Summary will appear here...</div>

All the code in the style block, change the colours and view of all the elements to make them aesthetically pleasing
The buttons (Start Recording, Stop Recording, and Summarize) trigger JavaScript functions to control audio capture and summarization.

Step 3: Interactivity and Real-Time Features

This application is real-time and interactive, because of Web Speech API , Socket.IO and Cohere

Speech Recognition in Action

The Web Speech API is a JavaScript API that allows web developers to integrate speech recognition(SpeechRecognition) and synthesis capabilities(SpeechSynthesis) into their applications. This API enables developers to create hands-free user experiences, accessibility features, voice-controlled applications, and more.

Socket.IO is a JavaScript library that enables real-time, bidirectional communication between web clients and servers. It has features like fallback options for WebSocket-lacking environments, automatic reconnection, and support for broadcasting messages to multiple clients.

Cohere is an AI platform that offers natural language processing capabilities, including text generation and summarization. It provides developers with tools to build applications that can understand and generate human-like text.

function startRecording() {
            var SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
            recognition = new SpeechRecognition();
            recognition.continuous = true;
            recognition.lang = 'en-US';
            recognition.interimResults = true;

            recognition.onresult = function(event) {
                var current = event.resultIndex;
                var transcript = event.results[current][0].transcript;
                transcription.innerText = transcript; 
            };

            recognition.start();

            recognition.onerror = function(event) {
                console.error('Speech recognition error', event.error);
            }
        }

        function stopRecording() {
            if (recognition) {
                recognition.stop();
                recognition = null;
            }
        }

startRecording: When you click the ‘Start Recording’ button, the Web Speech API begins transcribing your speech immediately and displays it in the transcription area.
stopRecording: Clicking ‘Stop Recording’ ends the speech recognition session.

Summarization with Cohere

@socketio.on('summarize')
def handle_summarization(data):
    text = data['text']
    print("Received text for summarization:", text[:100]) 
        response = cohere_client.generate(
            model='command', 
            prompt=f"You are a note-taking app. Summarize these notes: {text}", 
            max_tokens=300, 
            temperature=0.5
        )
        summary = response.generations[0].text
        print("Generated summary:", summary)  
        emit('summarization_result', {'summary': summary})
    except Exception as e:
        print(f"Error summarizing text: {e}")
        emit('summarization_error', {'error': str(e)})

summarizeText: After recording your notes, you can click ‘Summarize’ to send the text to the Flask server. There, the Cohere API called in app.py uses their generate feature where we used a specific prompt that condenses your notes into a summary, which is then displayed on the page.

Conclusion

All of these together builds a real-time note-taking application with Flask, Cohere, Socket.IO, and the Web Speech