We're seeking a talented AI/ML Engineer to join our team in developing cutting-edge text-to-speech technology. You'll work at the forefront of speech synthesis, building systems that transform written text into natural, expressive, human-like speech. This role combines deep learning expertise with audio processing to create voice experiences that push the boundaries of what's possible in TTS.
Key Responsibilities:
Work with us to build a cutting-edge TTS model.
- Model Development & Research
- Design, train, and optimize neural TTS models, including acoustic models, vocoders, and end-to-end synthesis systems.
- Implement state-of-the-art architectures such as Transformer-based models, diffusion models, and neural vocoders (HiFi-GAN, WaveGrad, etc.)
- Develop solutions for multi-speaker synthesis, voice cloning, and emotional/prosodic control.
- Research and implement techniques for low-latency inference and model compressio
- Data & Training Pipeline
- Build robust data processing pipelines for audio preprocessing, text normalization, and phoneme conversion.
- Design and implement training infrastructure for large-scale model development
- Create evaluation frameworks for objective metrics (MOS, PESQ) and subjective quality assessment.
- Manage voice data collection, annotation, and quality control processes.
- Production & Deployment
- Optimize models for production deployment with a focus on latency, throughput, and resource efficiency.
- Implement streaming TTS capabilities and real-time synthesis
- Develop APIs and integration points for TTS services
- Monitor and improve model performance in production environments
- Collaboration & Innovation
- Collaborate with research teams on novel TTS architectures and techniques
- Work with product teams to understand use cases and requirements
- Contribute to technical documentation and knowledge sharing
- Stay current with the latest developments in speech synthesis and neural audio generation
- Education & Experience
- Bachelor's degree in Computer Science, Electrical Engineering, or related field (Master's/PhD preferred)
- 3+ years of hands-on experience in machine learning or deep learning
- Demonstrated experience with speech processing, TTS, or related audio ML applications
- Technical Skills
- Strong proficiency in Python and ML frameworks (PyTorch or TensorFlow)
- Experience with speech processing libraries (librosa, torchaudio, scipy.signal)
- Understanding of signal processing, spectrograms, and audio feature extraction
- Familiarity with phonetics, linguistics concepts, and text processing for TTS
- Experience with distributed training and GPU optimization
- Core Competencies
- Solid understanding of deep learning architectures (CNNs, RNNs, Transformers, GANs, Diffusion models)
- Knowledge of TTS-specific concepts: mel-spectrograms, fundamental frequency, prosody modeling
- Experience with version control (Git) and collaborative development
- Strong problem-solving skills and attention to audio quality details
Preferred Qualifications
- Published research or contributions to TTS/speech synthesis
- Experience with specific TTS architectures (Tacotron, FastSpeech, VITS, Tortoise-TTS, etc.)
- Knowledge of multiple languages and their phonetic systems
- Experience with voice conversion and speaker adaptation techniques
- Familiarity with audio codecs and compression techniques
- Background in prosody modeling and expressive speech synthesis
- Experience with edge deployment and on-device inference
- Contributions to open-source speech/audio projects
What We Offer
- Opportunity to work on state-of-the-art TTS technology
- Access to cutting-edge computational resources and datasets
- Collaborative environment with leading researchers in speech synthesis
- Competitive compensation and benefits package
- Professional development and conference attendance opportunities
- Flexible work arrangements
Technical Stack
- Languages: Python, C++ (for optimization)
- Frameworks: PyTorch/TensorFlow, ONNX, TensorRT
- Tools: Weights & Biases, Docker, Kubernetes
- Audio Tools: Praat, Audacity, custom annotation tools
- Cloud Platforms: AWS/GCP/Azure for training and deployment
This position offers the unique opportunity to shape the future of human-computer interaction through voice. If you're passionate about creating natural, expressive synthetic speech and want to work with a team pushing the boundaries of what's possible in TTS, we'd love to hear from you.
How to Apply:
Interested candidates are invited to submit their resume and a cover letter detailing their experience and qualifications to hr@markopolo.ai. Please include "AI/ML Engineer—Text-to-Speech Systems Application" in the subject line.
Join us and be a part of the AI revolution in digital marketing!