Career
Job Post: AI/ML Engineer—Text-to-Speech Systems

Job Post: AI/ML Engineer—Text-to-Speech Systems

We're seeking a talented AI/ML Engineer to join our team in developing cutting-edge text-to-speech technology. You'll work at the forefront of speech synthesis, building systems that transform written text into natural, expressive, human-like speech. This role combines deep learning expertise with audio processing to create voice experiences that push the boundaries of what's possible in TTS. 

Key Responsibilities:

Work with us to build a cutting-edge TTS model.

  1. Model Development & Research
  • Design, train, and optimize neural TTS models, including acoustic models, vocoders, and end-to-end synthesis systems.
  • Implement state-of-the-art architectures such as Transformer-based models, diffusion models, and neural vocoders (HiFi-GAN, WaveGrad, etc.)
  • Develop solutions for multi-speaker synthesis, voice cloning, and emotional/prosodic control.
  • Research and implement techniques for low-latency inference and model compressio
  1. Data & Training Pipeline
  • Build robust data processing pipelines for audio preprocessing, text normalization, and phoneme conversion.
  • Design and implement training infrastructure for large-scale model development
  • Create evaluation frameworks for objective metrics (MOS, PESQ) and subjective quality assessment.
  • Manage voice data collection, annotation, and quality control processes.
  1. Production & Deployment
  • Optimize models for production deployment with a focus on latency, throughput, and resource efficiency.
  • Implement streaming TTS capabilities and real-time synthesis
  • Develop APIs and integration points for TTS services
  • Monitor and improve model performance in production environments
  1. Collaboration & Innovation
  • Collaborate with research teams on novel TTS architectures and techniques
  • Work with product teams to understand use cases and requirements
  • Contribute to technical documentation and knowledge sharing
  • Stay current with the latest developments in speech synthesis and neural audio generation
  1. Education & Experience
  • Bachelor's degree in Computer Science, Electrical Engineering, or related field (Master's/PhD preferred)
  • 3+ years of hands-on experience in machine learning or deep learning
  • Demonstrated experience with speech processing, TTS, or related audio ML applications
  1. Technical Skills
  • Strong proficiency in Python and ML frameworks (PyTorch or TensorFlow)
  • Experience with speech processing libraries (librosa, torchaudio, scipy.signal)
  • Understanding of signal processing, spectrograms, and audio feature extraction
  • Familiarity with phonetics, linguistics concepts, and text processing for TTS
  • Experience with distributed training and GPU optimization
  1. Core Competencies
  • Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, GANs, Diffusion models)
  • Knowledge of TTS-specific concepts: mel-spectrograms, fundamental frequency, prosody modeling
  • Experience with version control (Git) and collaborative development
  • Strong problem-solving skills and attention to audio quality details

Preferred Qualifications

  • Published research or contributions to TTS/speech synthesis
  • Experience with specific TTS architectures (Tacotron, FastSpeech, VITS, Tortoise-TTS, etc.)
  • Knowledge of multiple languages and their phonetic systems
  • Experience with voice conversion and speaker adaptation techniques
  • Familiarity with audio codecs and compression techniques
  • Background in prosody modeling and expressive speech synthesis
  • Experience with edge deployment and on-device inference
  • Contributions to open-source speech/audio projects

What We Offer

  • Opportunity to work on state-of-the-art TTS technology
  • Access to cutting-edge computational resources and datasets
  • Collaborative environment with leading researchers in speech synthesis
  • Competitive compensation and benefits package
  • Professional development and conference attendance opportunities
  • Flexible work arrangements

Technical Stack

  • Languages: Python, C++ (for optimization)
  • Frameworks: PyTorch/TensorFlow, ONNX, TensorRT
  • Tools: Weights & Biases, Docker, Kubernetes
  • Audio Tools: Praat, Audacity, custom annotation tools
  • Cloud Platforms: AWS/GCP/Azure for training and deployment

This position offers the unique opportunity to shape the future of human-computer interaction through voice. If you're passionate about creating natural, expressive synthetic speech and want to work with a team pushing the boundaries of what's possible in TTS, we'd love to hear from you.

How to Apply:

Interested candidates are invited to submit their resume and a cover letter detailing their experience and qualifications to hr@markopolo.ai. Please include "AI/ML Engineer—Text-to-Speech Systems Application" in the subject line.

Join us and be a part of the AI revolution in digital marketing!