Text recognition Flutter - create OCR scanner app

In this article we are going to see how we can create an application that recognizes text extracted from images and prints it on the screen. To do this we are going to use the camera plugin to open a video feed directly from the device's camera, and the google_mlkit_text_recognition package to scan the camera frames for text.

📽 Video version available on YouTube and Odysee

https://youtu.be/hQ7tj8wouVM

Create a new Flutter application:

flutter create --platforms android,ios flutter_text_recognition

Modify the pubspec.yaml file so that it uses the aforementioned plugins as well as permission_handler, which will be used to request permission to use the camera:

name: flutter_text_recognition
description: A new Flutter project.
publish_to: 'none'
version: 1.0.0+1

environment:
  sdk: '>=2.18.2 <3.0.0'

dependencies:
  camera: ^0.10.0+4
  flutter:
    sdk: flutter
  google_mlkit_text_recognition: ^0.4.0
  permission_handler: ^10.2.0

dev_dependencies:
  flutter_test:
    sdk: flutter
  flutter_lints: ^2.0.0

flutter:
  uses-material-design: true

It is possible that if you try to run this app now you will get an error because the google_mlkit_text_recognition package needs you to target a higher Android SDK, if that is the case modify the android block of build.gradle(android/app /build.gradle) with the following values:

// Modify only the android block as follows:
android {
    compileSdkVersion 33
    ndkVersion flutter.ndkVersion

    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

    kotlinOptions {
        jvmTarget = '1.8'
    }

    sourceSets {
        main.java.srcDirs += 'src/main/kotlin'
    }

    defaultConfig {
        // TODO: Specify your own unique Application ID (https://developer.android.com/studio/build/application-id.html).
        applicationId "com.example.flutter_text_recognition"
        // You can update the following values to match your application needs.
        // For more information, see: https://docs.flutter.dev/deployment/android#reviewing-the-build-configuration.
        minSdkVersion 22
        targetSdkVersion 32
        versionCode flutterVersionCode.toInteger()
        versionName flutterVersionName
    }

    buildTypes {
        release {
            // TODO: Add your own signing config for the release build.
            // Signing with the debug keys for now, so `flutter run --release` works.
            signingConfig signingConfigs.debug
        }
    }
}

Notice that I have set targetSdkVersion to 32. The reason is that there is a bug right now that causes problems with version 33.

Camera permission

The first step will be to get permission to open the camera. To do this we must define said permission, first in the AndroidManifest.xml file (android/app/src/main/AndroidManifest.xml):

<uses-permission android:name="android.permission.CAMERA" />

And also in Info.plist (ios/Runner/Info.plist):

<key>NSCameraUsageDescription</key>
<string>This app needs access to your camera in order to scan text.</string>

Additionally, if we look at the permission_handler setup instructions for iOS, we must also define the permissions that we are going to use in Podfile (ios/Podfile):

# Modify the last part of this file to look like this:

post_install do |installer|
  installer.pods_project.targets.each do |target|
    flutter_additional_ios_build_settings(target)

    # Add the following block:
    target.build_configurations.each do |config|
      config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = $iOSVersion

      config.build_settings['GCC_PREPROCESSOR_DEFINITIONS'] ||= [
        '$(inherited)',

        ## dart: PermissionGroup.camera
        'PERMISSION_CAMERA=1',
      ]

    end
  end
end

Now modify the lib/main.dart file with the following code:

import 'package:flutter/material.dart';
import 'package:permission_handler/permission_handler.dart';

void main() {
  runApp(const App());
}

class App extends StatelessWidget {
  const App({super.key});

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Text Recognition',
      theme: ThemeData(
        primarySwatch: Colors.blue,
      ),
      home: const MainScreen(),
    );
  }
}

class MainScreen extends StatefulWidget {
  const MainScreen({super.key});

  @override
  State<MainScreen> createState() => _MainScreenState();
}

class _MainScreenState extends State<MainScreen> {
  bool _isPermissionGranted = false;

  late final Future<void> _future;

  @override
  void initState() {
    super.initState();

    _future = _requestCameraPermission();
  }

  @override
  Widget build(BuildContext context) {
    return FutureBuilder(
      future: _future,
      builder: (context, snapshot) {
        return Scaffold(
          appBar: AppBar(
            title: const Text('Text Recognition Sample'),
          ),
          body: Center(
            child: Container(
              padding: const EdgeInsets.only(left: 24.0, right: 24.0),
              child: Text(
                _isPermissionGranted
                    ? 'Camera permission granted'
                    : 'Camera permission denied',
                textAlign: TextAlign.center,
              ),
            ),
          ),
        );
      },
    );
  }

  Future<void> _requestCameraPermission() async {
    final status = await Permission.camera.request();
    _isPermissionGranted = status == PermissionStatus.granted;
  }
}

In this code block we define a Future that is going to be executed in a FutureBuilder. This Future calls the _requestCameraPermission() method, which is responsible for requesting the camera's permission through Permission.camera.request and altering the _isPermissionGranted state variable depending on whether the permission has been granted or not.

In this case, the FutureBuilder widget is especially useful, since the act of requesting a permission is an asynchronous operation that we will want to perform at the beginning of the app execution. It is important to note that the Future used by FutureBuilder must be declared in the body of the class, this way if this widget is rebuilt this Future will contain the previous result and will not be executed again.

This is a very simplified way of asking for a permission, you can see how I ask for permissions and handle all the possible scenarios in this article.

Show the camera preview

Now we are going to take the camera feed and render it on the screen so that the user can point to the text they want to scan.

A possible way to do it would be that once we have the permissions of the camera we show the preview inside the Scaffold, but the problem with doing it that way is that due to the aspect ratio of the camera, it is possible that it does not look very good. For this reason, I prefer to show the preview of the camera behind the Scaffold, in the following way:

import 'package:camera/camera.dart';
import 'package:flutter/material.dart';
import 'package:permission_handler/permission_handler.dart';

void main() {
  runApp(const App());
}

class App extends StatelessWidget {
  const App({super.key});

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Text Recognition',
      theme: ThemeData(
        primarySwatch: Colors.blue,
      ),
      home: const MainScreen(),
    );
  }
}

class MainScreen extends StatefulWidget {
  const MainScreen({super.key});

  @override
  State<MainScreen> createState() => _MainScreenState();
}

// Add the WidgetsBindingObserver mixin
class _MainScreenState extends State<MainScreen> with WidgetsBindingObserver {
  bool _isPermissionGranted = false;

  late final Future<void> _future;

  // Add this controller to be able to control de camera
  CameraController? _cameraController;

  @override
  void initState() {
    super.initState();
    WidgetsBinding.instance.addObserver(this);

    _future = _requestCameraPermission();
  }

  // We should stop the camera once this widget is disposed
  @override
  void dispose() {
    WidgetsBinding.instance.removeObserver(this);
    _stopCamera();
    super.dispose();
  }

  // Starts and stops the camera according to the lifecycle of the app
  @override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    if (_cameraController == null || !_cameraController!.value.isInitialized) {
      return;
    }

    if (state == AppLifecycleState.inactive) {
      _stopCamera();
    } else if (state == AppLifecycleState.resumed &&
        _cameraController != null &&
        _cameraController!.value.isInitialized) {
      _startCamera();
    }
  }

  @override
  Widget build(BuildContext context) {
    return FutureBuilder(
      future: _future,
      builder: (context, snapshot) {
        return Stack(
          children: [
            // Show the camera feed behind everything
            if (_isPermissionGranted)
              FutureBuilder<List<CameraDescription>>(
                future: availableCameras(),
                builder: (context, snapshot) {
                  if (snapshot.hasData) {
                    _initCameraController(snapshot.data!);

                    return Center(child: CameraPreview(_cameraController!));
                  } else {
                    return const LinearProgressIndicator();
                  }
                },
              ),
            Scaffold(
              appBar: AppBar(
                title: const Text('Text Recognition Sample'),
              ),
              // Set the background to transparent so you can see the camera preview
              backgroundColor: _isPermissionGranted ? Colors.transparent : null,
              body: _isPermissionGranted
                  ? Column(
                      children: [
                        Expanded(
                          child: Container(),
                        ),
                        Container(
                          padding: const EdgeInsets.only(bottom: 30.0),
                          child: Center(
                            child: ElevatedButton(
                              onPressed: null,
                              child: const Text('Scan text'),
                            ),
                          ),
                        ),
                      ],
                    )
                  : Center(
                      child: Container(
                        padding: const EdgeInsets.only(left: 24.0, right: 24.0),
                        child: const Text(
                          'Camera permission denied',
                          textAlign: TextAlign.center,
                        ),
                      ),
                    ),
            ),
          ],
        );
      },
    );
  }

  Future<void> _requestCameraPermission() async {
    final status = await Permission.camera.request();
    _isPermissionGranted = status == PermissionStatus.granted;
  }

  void _startCamera() {
    if (_cameraController != null) {
      _cameraSelected(_cameraController!.description);
    }
  }

  void _stopCamera() {
    if (_cameraController != null) {
      _cameraController?.dispose();
    }
  }

  void _initCameraController(List<CameraDescription> cameras) {
    if (_cameraController != null) {
      return;
    }

    // Select the first rear camera.
    CameraDescription? camera;
    for (var i = 0; i < cameras.length; i++) {
      final CameraDescription current = cameras[i];
      if (current.lensDirection == CameraLensDirection.back) {
        camera = current;
        break;
      }
    }

    if (camera != null) {
      _cameraSelected(camera);
    }
  }

  Future<void> _cameraSelected(CameraDescription camera) async {
    _cameraController = CameraController(
      camera,
      ResolutionPreset.max,
      enableAudio: false,
    );

    await _cameraController!.initialize();

    if (!mounted) {
      return;
    }
    setState(() {});
  }
}

In this block of code I have left you in the form of comments the changes that must be made. I have added several methods at the end of the class related to camera management. These methods are very well explained in the camera documentation.

Text Recognition

To finish this tutorial we go to the best part: how to recognize text and display it on the screen. First, we will create a new lib/result_screen.dart file with a widget that will take care of displaying the scanned text:

import 'package:flutter/material.dart';

class ResultScreen extends StatelessWidget {
  final String text;

  const ResultScreen({super.key, required this.text});

  @override
  Widget build(BuildContext context) => Scaffold(
        appBar: AppBar(
          title: const Text('Result'),
        ),
        body: Container(
          padding: const EdgeInsets.all(30.0),
          child: Text(text),
        ),
      );
}

Now back to the lib/main.dart file, create a variable of type TextRecognizer at the top of the class:

final textRecognizer = TextRecognizer();

It is important to close this object in dispose():

@override
void dispose() {
  _stopCamera();
  textRecognizer.close();
  super.dispose();
}

Add the following method below everything:

Future<void> _scanImage() async {
  if (_cameraController == null) return;

  final navigator = Navigator.of(context);

  try {
    final pictureFile = await _cameraController!.takePicture();

    final file = File(pictureFile.path);

    final inputImage = InputImage.fromFile(file);
    final recognizedText = await textRecognizer.processImage(inputImage);

    await navigator.push(
      MaterialPageRoute(
        builder: (BuildContext context) =>
            ResultScreen(text: recognizedText.text),
      ),
    );
  } catch (e) {
    ScaffoldMessenger.of(context).showSnackBar(
      const SnackBar(
        content: Text('An error occurred when scanning text'),
      ),
    );
  }
}

And use it as a callback for the button we added earlier:

// [...]
child: ElevatedButton(
  onPressed: _scanImage,
  child: const Text('Scan text'),
),
// [...]

In this _scanImage() method we first take a picture from the camera, then create an object of type InputImage passing as a parameter the path to the file of the picture taken, and finally pass it through the scanner (the TextRecognizer object defined at the beginning of everything).

If all went well, we invoke the ResultScreen with the scanned text. If something has failed, we show a SnackBar reporting the error.

Conclusion

And in this very simple way it is possible to create a text recognition system in Flutter. There are variations of what we have seen here, for example instead of taking a photo from the camera's live feed we could also upload a local image and run it through the scanner; or even get many frames continuously from the camera and display the text in real-time overlaid on top of the camera preview.

You can find the source code of this project here.

Happy coding :)