Markopolo Career - Life at Markopolo

Job Post: AI/ML Engineer—Text-to-Speech Systems

We're seeking a talented AI/ML Engineer to join our team in developing cutting-edge text-to-speech technology. You'll work at the forefront of speech synthesis, building systems that transform written text into natural, expressive, human-like speech. This role combines deep learning expertise with audio processing to create voice experiences that push the boundaries of what's possible in TTS.

‍

Key Responsibilities:

Work with us to build a cutting-edge TTS model.

‍

Model Development & Research

Design, train, and optimize neural TTS models, including acoustic models, vocoders, and end-to-end synthesis systems.
Implement state-of-the-art architectures such as Transformer-based models, diffusion models, and neural vocoders (HiFi-GAN, WaveGrad, etc.)
Develop solutions for multi-speaker synthesis, voice cloning, and emotional/prosodic control.
Research and implement techniques for low-latency inference and model compressio

Data & Training Pipeline

Build robust data processing pipelines for audio preprocessing, text normalization, and phoneme conversion.
Design and implement training infrastructure for large-scale model development
Create evaluation frameworks for objective metrics (MOS, PESQ) and subjective quality assessment.
Manage voice data collection, annotation, and quality control processes.

Production & Deployment

Optimize models for production deployment with a focus on latency, throughput, and resource efficiency.
Implement streaming TTS capabilities and real-time synthesis
Develop APIs and integration points for TTS services
Monitor and improve model performance in production environments

Collaboration & Innovation

Collaborate with research teams on novel TTS architectures and techniques
Work with product teams to understand use cases and requirements
Contribute to technical documentation and knowledge sharing
Stay current with the latest developments in speech synthesis and neural audio generation

Education & Experience

Bachelor's degree in Computer Science, Electrical Engineering, or related field (Master's/PhD preferred)
3+ years of hands-on experience in machine learning or deep learning
Demonstrated experience with speech processing, TTS, or related audio ML applications

Technical Skills

Strong proficiency in Python and ML frameworks (PyTorch or TensorFlow)
Experience with speech processing libraries (librosa, torchaudio, scipy.signal)
Understanding of signal processing, spectrograms, and audio feature extraction
Familiarity with phonetics, linguistics concepts, and text processing for TTS
Experience with distributed training and GPU optimization

Core Competencies

Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, GANs, Diffusion models)
Knowledge of TTS-specific concepts: mel-spectrograms, fundamental frequency, prosody modeling
Experience with version control (Git) and collaborative development
Strong problem-solving skills and attention to audio quality details

‍

Preferred Qualifications

Published research or contributions to TTS/speech synthesis
Experience with specific TTS architectures (Tacotron, FastSpeech, VITS, Tortoise-TTS, etc.)
Knowledge of multiple languages and their phonetic systems
Experience with voice conversion and speaker adaptation techniques
Familiarity with audio codecs and compression techniques
Background in prosody modeling and expressive speech synthesis
Experience with edge deployment and on-device inference
Contributions to open-source speech/audio projects

What We Offer

Opportunity to work on state-of-the-art TTS technology
Access to cutting-edge computational resources and datasets
Collaborative environment with leading researchers in speech synthesis
Competitive compensation and benefits package
Professional development and conference attendance opportunities
Flexible work arrangements

Technical Stack

Languages: Python, C++ (for optimization)
Frameworks: PyTorch/TensorFlow, ONNX, TensorRT
Tools: Weights & Biases, Docker, Kubernetes
Audio Tools: Praat, Audacity, custom annotation tools
Cloud Platforms: AWS/GCP/Azure for training and deployment

‍

This position offers the unique opportunity to shape the future of human-computer interaction through voice. If you're passionate about creating natural, expressive synthetic speech and want to work with a team pushing the boundaries of what's possible in TTS, we'd love to hear from you.

‍

How to Apply:

Interested candidates are invited to submit their resume and a cover letter detailing their experience and qualifications to hr@markopolo.ai. Please include "AI/ML Engineer—Text-to-Speech Systems Application" in the subject line.

‍

Join us and be a part of the AI revolution in digital marketing!

Apply for this job